Speech Configuration
Overview
The speech object allows you to configure Text-to-Speech (TTS) and Speech-to-Text (STT) providers directly in your librechat.yaml configuration file. This enables server-side speech services without requiring users to configure their own API keys.
Fields under speech:
tts- Text-to-Speech provider configurationsstt- Speech-to-Text provider configurationsspeechTab- Default UI settings for speech features
Notes:
- Multiple providers can be configured simultaneously
- Users can select their preferred provider from the available options
- API keys in the config file should use environment variable references for security
Example
tts
The tts object configures Text-to-Speech providers. Multiple providers can be configured, and users can choose which one to use.
openai
OpenAI TTS configuration using models like tts-1 or tts-1-hd.
| Key | Type | Description | Example |
|---|---|---|---|
| url | String | Custom API URL (optional). Use for OpenAI-compatible endpoints. | |
| apiKey | String | OpenAI API key. Use environment variable reference. | Required |
| model | String | TTS model to use (e.g., "tts-1", "tts-1-hd"). | Required |
| voices | Array of Strings | Available voice options for users to select. | Required |
Example:
azureOpenAI
Azure OpenAI TTS configuration.
| Key | Type | Description | Example |
|---|---|---|---|
| instanceName | String | Azure OpenAI instance name. | Required |
| apiKey | String | Azure OpenAI API key. | Required |
| deploymentName | String | The deployment name for the TTS model. | Required |
| apiVersion | String | Azure OpenAI API version. | Required |
| model | String | TTS model identifier. | Required |
| voices | Array of Strings | Available voice options. | Required |
Example:
elevenlabs
ElevenLabs TTS configuration for high-quality voice synthesis.
| Key | Type | Description | Example |
|---|---|---|---|
| url | String | Custom API URL (optional). | |
| websocketUrl | String | WebSocket URL for streaming (optional). | |
| apiKey | String | ElevenLabs API key. | Required |
| model | String | ElevenLabs model (e.g., "eleven_multilingual_v2"). | Required |
| voices | Array of Strings | Voice IDs available for selection. | Required |
| voice_settings | Object | Voice customization settings (optional). | |
| pronunciation_dictionary_locators | Array of Strings | Pronunciation dictionary IDs (optional). |
voice_settings Sub-keys:
| Key | Type | Description | Example |
|---|---|---|---|
| similarity_boost | Number | Voice similarity enhancement (0-1). | |
| stability | Number | Voice stability (0-1). | |
| style | Number | Style exaggeration (0-1). | |
| use_speaker_boost | Boolean | Enable speaker boost. |
Example:
localai
LocalAI TTS configuration for self-hosted speech synthesis.
| Key | Type | Description | Example |
|---|---|---|---|
| url | String | LocalAI server URL. | Required |
| apiKey | String | API key if authentication is enabled (optional). | |
| voices | Array of Strings | Available voice models. | Required |
| backend | String | TTS backend to use (e.g., "piper"). | Required |
Example:
stt
The stt object configures Speech-to-Text providers.
openai
OpenAI Whisper STT configuration.
| Key | Type | Description | Example |
|---|---|---|---|
| url | String | Custom API URL (optional). Use for OpenAI-compatible endpoints. | |
| apiKey | String | OpenAI API key. Use environment variable reference. | Required |
| model | String | STT model to use (e.g., "whisper-1"). | Required |
Example:
azureOpenAI
Azure OpenAI Whisper STT configuration.
| Key | Type | Description | Example |
|---|---|---|---|
| instanceName | String | Azure OpenAI instance name. | Required |
| apiKey | String | Azure OpenAI API key. | Required |
| deploymentName | String | The deployment name for the Whisper model. | Required |
| apiVersion | String | Azure OpenAI API version. | Required |
Example:
speechTab
The speechTab object configures default UI settings for speech features. These settings control what users see by default in the speech settings panel.
| Key | Type | Description | Example |
|---|---|---|---|
| conversationMode | Boolean | Enable conversation mode by default. | false |
| advancedMode | Boolean | Show advanced speech settings by default. | false |
| speechToText | Boolean or Object | Enable STT by default, or configure detailed STT settings. | false |
| textToSpeech | Boolean or Object | Enable TTS by default, or configure detailed TTS settings. | false |
speechToText (Object format)
When using an object instead of a boolean:
| Key | Type | Description | Example |
|---|---|---|---|
| engineSTT | String | Default STT engine. | |
| languageSTT | String | Default language for STT. | |
| autoTranscribeAudio | Boolean | Automatically transcribe audio messages. | |
| decibelValue | Number | Decibel threshold for voice detection. | |
| autoSendText | Number | Delay in ms before auto-sending transcribed text (0 to disable). |
textToSpeech (Object format)
When using an object instead of a boolean:
| Key | Type | Description | Example |
|---|---|---|---|
| engineTTS | String | Default TTS engine. | |
| voice | String | Default voice selection. | |
| languageTTS | String | Default language for TTS. | |
| automaticPlayback | Boolean | Automatically play TTS responses. | |
| playbackRate | Number | Default playback speed (1.0 = normal). | |
| cacheTTS | Boolean | Cache TTS audio for repeated playback. |
Example:
Complete Example
Notes
- Always use environment variable references (e.g.,
${API_KEY}) for API keys in configuration files - Multiple TTS providers can be configured; users select their preferred option in the UI
- The
speechTabsettings define defaults that users can override in their personal settings - For detailed feature documentation, see Speech to Text & Text to Speech
How is this guide?