vLLM
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.
Notes:
- 
Not Known: icon not provided, but fetching list of models is recommended to get available models from your local vLLM server. 
- 
The titleMessageRoleis important as some local LLMs will not accept system message roles for title messages (which is the default).
- 
This configuration assumes you have a vLLM server running locally at the specified baseURL. 
    - name: "vLLM"
      apiKey: "vllm"
      baseURL: "http://127.0.0.1:8023/v1"
      models:
        default: ['google/gemma-3-27b-it']
        fetch: true
      titleConvo: true
      titleModel: "current_model"
      titleMessageRole: "user"
      summarize: false
      summaryModel: "current_model"
      forcePrompt: falseThe configuration above connects LibreChat to a local vLLM server running on port 8023. It uses the Gemma 3 27B model as the default model, but will fetch all available models from your vLLM server.
Key Configuration Options
- apiKey: A simple placeholder value for vLLM (local deployments typically don’t require authentication)
- baseURL: The URL where your vLLM server is running
- titleMessageRole: Set to “user” instead of the default “system” as some local LLMs don’t support system messages