vLLM

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.

Notes:

Not Known: icon not provided, but fetching list of models is recommended to get available models from your local vLLM server.
The titleMessageRole is important as some local LLMs will not accept system message roles for title messages (which is the default).
This configuration assumes you have a vLLM server running locally at the specified baseURL.

    - name: "vLLM"
      apiKey: "vllm"
      baseURL: "http://127.0.0.1:8023/v1"
      models:
        default: ['google/gemma-3-27b-it']
        fetch: true
      titleConvo: true
      titleModel: "current_model"
      titleMessageRole: "user"
      summarize: false
      summaryModel: "current_model"
      forcePrompt: false

The configuration above connects LibreChat to a local vLLM server running on port 8023. It uses the Gemma 3 27B model as the default model, but will fetch all available models from your vLLM server.

Key Configuration Options

apiKey: A simple placeholder value for vLLM (local deployments typically don’t require authentication)
baseURL: The URL where your vLLM server is running
titleMessageRole: Set to “user” instead of the default “system” as some local LLMs don’t support system messages

TrueFoundry AI Gateway Vultr Cloud Inference