vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.
Notes:
-
Not Known: icon not provided, but fetching list of models is recommended to get available models from your local vLLM server.
-
The
titleMessageRole
is important as some local LLMs will not accept system message roles for title messages (which is the default). -
This configuration assumes you have a vLLM server running locally at the specified baseURL.
- name: "vLLM"
apiKey: "vllm"
baseURL: "http://127.0.0.1:8023/v1"
models:
default: ['google/gemma-3-27b-it']
fetch: true
titleConvo: true
titleModel: "current_model"
titleMessageRole: "user"
summarize: false
summaryModel: "current_model"
forcePrompt: false
The configuration above connects LibreChat to a local vLLM server running on port 8023. It uses the Gemma 3 27B model as the default model, but will fetch all available models from your vLLM server.
Key Configuration Options
apiKey
: A simple placeholder value for vLLM (local deployments typically don’t require authentication)baseURL
: The URL where your vLLM server is runningtitleMessageRole
: Set to “user” instead of the default “system” as some local LLMs don’t support system messages