Skip to main content
LibreChat is joining ClickHouse to power the open-source Agentic Data Stack 🎉 Learn more
LibreChat

vLLM

Configure vLLM as a custom endpoint in LibreChat.

vLLM is a high-throughput, memory-efficient inference and serving engine for LLMs. It exposes an OpenAI-compatible API, so you can run it locally and point LibreChat at your own server.

Configuration

Local vLLM deployments don't require authentication, so the API key is just a placeholder. Point baseURL at your running vLLM server. Add the endpoint under endpoints.custom in your librechat.yaml:

    - name: "vLLM"
      apiKey: "vllm"
      baseURL: "http://127.0.0.1:8023/v1"
      models:
        default: ['google/gemma-3-27b-it']
        fetch: true
      titleConvo: true
      titleModel: "current_model"
      titleMessageRole: "user"
      summarize: false
      summaryModel: "current_model"

Notes

  • The example connects to a local vLLM server on port 8023 with Gemma 3 27B as the default. Set baseURL to wherever your server is running.
  • With fetch: true, LibreChat loads the full list of models available on your vLLM server, so default is only the initial selection.
  • titleMessageRole: "user" overrides the default system role for title generation. Some local models reject system message roles, so sending the title prompt as a user message avoids errors.

How is this guide?

On this page