vLLM
Configure vLLM as a custom endpoint in LibreChat.
vLLM is a high-throughput, memory-efficient inference and serving engine for LLMs. It exposes an OpenAI-compatible API, so you can run it locally and point LibreChat at your own server.
Configuration
Local vLLM deployments don't require authentication, so the API key is just a placeholder. Point baseURL at your running vLLM server. Add the endpoint under endpoints.custom in your librechat.yaml:
- name: "vLLM"
apiKey: "vllm"
baseURL: "http://127.0.0.1:8023/v1"
models:
default: ['google/gemma-3-27b-it']
fetch: true
titleConvo: true
titleModel: "current_model"
titleMessageRole: "user"
summarize: false
summaryModel: "current_model"Notes
- The example connects to a local vLLM server on port 8023 with Gemma 3 27B as the default. Set
baseURLto wherever your server is running. - With
fetch: true, LibreChat loads the full list of models available on your vLLM server, sodefaultis only the initial selection. titleMessageRole: "user"overrides the defaultsystemrole for title generation. Some local models reject system message roles, so sending the title prompt as a user message avoids errors.
How is this guide?