# vLLM (/docs/configuration/librechat_yaml/ai_endpoints/vllm)

> vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.

**Notes:**

- **Not Known:** icon not provided, but fetching list of models is recommended to get available models from your local vLLM server.

- The `titleMessageRole` is important as some local LLMs will not accept system message roles for title messages (which is the default).

- This configuration assumes you have a vLLM server running locally at the specified baseURL.

```yaml
    - name: "vLLM"
      apiKey: "vllm"
      baseURL: "http://127.0.0.1:8023/v1"
      models:
        default: ['google/gemma-3-27b-it']
        fetch: true
      titleConvo: true
      titleModel: "current_model"
      titleMessageRole: "user"
      summarize: false
      summaryModel: "current_model"
```

The configuration above connects LibreChat to a local vLLM server running on port 8023. It uses the Gemma 3 27B model as the default model, but will fetch all available models from your vLLM server.

## Key Configuration Options

- `apiKey`: A simple placeholder value for vLLM (local deployments typically don't require authentication)
- `baseURL`: The URL where your vLLM server is running
- `titleMessageRole`: Set to "user" instead of the default "system" as some local LLMs don't support system messages
