vLLM
Example configuration for vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.
Notes:
-
Not Known: icon not provided, but fetching list of models is recommended to get available models from your local vLLM server.
-
The
titleMessageRoleis important as some local LLMs will not accept system message roles for title messages (which is the default). -
This configuration assumes you have a vLLM server running locally at the specified baseURL.
The configuration above connects LibreChat to a local vLLM server running on port 8023. It uses the Gemma 3 27B model as the default model, but will fetch all available models from your vLLM server.
Key Configuration Options
apiKey: A simple placeholder value for vLLM (local deployments typically don't require authentication)baseURL: The URL where your vLLM server is runningtitleMessageRole: Set to "user" instead of the default "system" as some local LLMs don't support system messages
How is this guide?