Speech Configuration
Overview
The speech object allows you to configure Text-to-Speech (TTS) and Speech-to-Text (STT) providers directly in your librechat.yaml configuration file. This enables server-side speech services without requiring users to configure their own API keys.
Fields under speech:
tts- Text-to-Speech provider configurationsstt- Speech-to-Text provider configurationsspeechTab- Default UI settings for speech features
Notes:
- Multiple providers can be configured simultaneously
- Users can select their preferred provider from the available options
- API keys in the config file should use environment variable references for security
Example
speech:
tts:
openai:
apiKey: "${TTS_API_KEY}"
model: "tts-1"
voices: ["alloy", "echo", "fable", "onyx", "nova", "shimmer"]
elevenlabs:
apiKey: "${ELEVENLABS_API_KEY}"
model: "eleven_multilingual_v2"
voices: ["voice-id-1", "voice-id-2"]
stt:
openai:
apiKey: "${STT_API_KEY}"
model: "whisper-1"
speechTab:
conversationMode: true
advancedMode: false
speechToText: true
textToSpeech: truetts
The tts object configures Text-to-Speech providers. Multiple providers can be configured, and users can choose which one to use.
openai
OpenAI TTS configuration using models like tts-1 or tts-1-hd.
| Key | Type | Description | Example |
|---|---|---|---|
| url | String | Custom API URL (optional). Use for OpenAI-compatible endpoints. | |
| apiKey | String | OpenAI API key. Use environment variable reference. | Required |
| model | String | TTS model to use (e.g., "tts-1", "tts-1-hd"). | Required |
| voices | Array of Strings | Available voice options for users to select. | Required |
Example:
tts:
openai:
apiKey: "${TTS_API_KEY}"
model: "tts-1"
voices: ["alloy", "echo", "fable", "onyx", "nova", "shimmer"]azureOpenAI
Azure OpenAI TTS configuration.
| Key | Type | Description | Example |
|---|---|---|---|
| instanceName | String | Azure OpenAI instance name. | Required |
| apiKey | String | Azure OpenAI API key. | Required |
| deploymentName | String | The deployment name for the TTS model. | Required |
| apiVersion | String | Azure OpenAI API version. | Required |
| model | String | TTS model identifier. | Required |
| voices | Array of Strings | Available voice options. | Required |
Example:
tts:
azureOpenAI:
instanceName: "my-azure-instance"
apiKey: "${AZURE_TTS_API_KEY}"
deploymentName: "tts-deployment"
apiVersion: "2024-02-15-preview"
model: "tts-1"
voices: ["alloy", "echo", "nova"]elevenlabs
ElevenLabs TTS configuration for high-quality voice synthesis.
| Key | Type | Description | Example |
|---|---|---|---|
| url | String | Custom API URL (optional). | |
| websocketUrl | String | WebSocket URL for streaming (optional). | |
| apiKey | String | ElevenLabs API key. | Required |
| model | String | ElevenLabs model (e.g., "eleven_multilingual_v2"). | Required |
| voices | Array of Strings | Voice IDs available for selection. | Required |
| voice_settings | Object | Voice customization settings (optional). | |
| pronunciation_dictionary_locators | Array of Strings | Pronunciation dictionary IDs (optional). |
voice_settings Sub-keys:
| Key | Type | Description | Example |
|---|---|---|---|
| similarity_boost | Number | Voice similarity enhancement (0-1). | |
| stability | Number | Voice stability (0-1). | |
| style | Number | Style exaggeration (0-1). | |
| use_speaker_boost | Boolean | Enable speaker boost. |
Example:
tts:
elevenlabs:
apiKey: "${ELEVENLABS_API_KEY}"
model: "eleven_multilingual_v2"
voices: ["21m00Tcm4TlvDq8ikWAM", "AZnzlk1XvdvUeBnXmlld"]
voice_settings:
stability: 0.5
similarity_boost: 0.75
use_speaker_boost: truelocalai
LocalAI TTS configuration for self-hosted speech synthesis.
| Key | Type | Description | Example |
|---|---|---|---|
| url | String | LocalAI server URL. | Required |
| apiKey | String | API key if authentication is enabled (optional). | |
| voices | Array of Strings | Available voice models. | Required |
| backend | String | TTS backend to use (e.g., "piper"). | Required |
Example:
tts:
localai:
url: "http://localhost:8080"
voices: ["en-us-amy-low", "en-us-danny-low"]
backend: "piper"stt
The stt object configures Speech-to-Text providers.
openai
OpenAI Whisper STT configuration.
| Key | Type | Description | Example |
|---|---|---|---|
| url | String | Custom API URL (optional). Use for OpenAI-compatible endpoints. | |
| apiKey | String | OpenAI API key. Use environment variable reference. | Required |
| model | String | STT model to use (e.g., "whisper-1"). | Required |
Example:
stt:
openai:
apiKey: "${STT_API_KEY}"
model: "whisper-1"azureOpenAI
Azure OpenAI Whisper STT configuration.
| Key | Type | Description | Example |
|---|---|---|---|
| instanceName | String | Azure OpenAI instance name. | Required |
| apiKey | String | Azure OpenAI API key. | Required |
| deploymentName | String | The deployment name for the Whisper model. | Required |
| apiVersion | String | Azure OpenAI API version. | Required |
Example:
stt:
azureOpenAI:
instanceName: "my-azure-instance"
apiKey: "${AZURE_STT_API_KEY}"
deploymentName: "whisper-deployment"
apiVersion: "2024-02-15-preview"speechTab
The speechTab object configures default UI settings for speech features. These settings control what users see by default in the speech settings panel.
| Key | Type | Description | Example |
|---|---|---|---|
| conversationMode | Boolean | Enable conversation mode by default. | false |
| advancedMode | Boolean | Show advanced speech settings by default. | false |
| speechToText | Boolean or Object | Enable STT by default, or configure detailed STT settings. | false |
| textToSpeech | Boolean or Object | Enable TTS by default, or configure detailed TTS settings. | false |
speechToText (Object format)
When using an object instead of a boolean:
| Key | Type | Description | Example |
|---|---|---|---|
| engineSTT | String | Default STT engine. | |
| languageSTT | String | Default language for STT. | |
| autoTranscribeAudio | Boolean | Automatically transcribe audio messages. | |
| decibelValue | Number | Decibel threshold for voice detection. | |
| autoSendText | Number | Delay in ms before auto-sending transcribed text (0 to disable). |
textToSpeech (Object format)
When using an object instead of a boolean:
| Key | Type | Description | Example |
|---|---|---|---|
| engineTTS | String | Default TTS engine. | |
| voice | String | Default voice selection. | |
| languageTTS | String | Default language for TTS. | |
| automaticPlayback | Boolean | Automatically play TTS responses. | |
| playbackRate | Number | Default playback speed (1.0 = normal). | |
| cacheTTS | Boolean | Cache TTS audio for repeated playback. |
Example:
speechTab:
conversationMode: false
advancedMode: false
speechToText:
engineSTT: "openai"
autoTranscribeAudio: true
decibelValue: -45
textToSpeech:
engineTTS: "openai"
voice: "nova"
automaticPlayback: false
playbackRate: 1.0
cacheTTS: trueComplete Example
version: 1.2.9
cache: true
speech:
tts:
openai:
apiKey: "${TTS_API_KEY}"
model: "tts-1-hd"
voices: ["alloy", "echo", "fable", "onyx", "nova", "shimmer"]
elevenlabs:
apiKey: "${ELEVENLABS_API_KEY}"
model: "eleven_multilingual_v2"
voices: ["21m00Tcm4TlvDq8ikWAM", "AZnzlk1XvdvUeBnXmlld"]
voice_settings:
stability: 0.5
similarity_boost: 0.75
stt:
openai:
apiKey: "${STT_API_KEY}"
model: "whisper-1"
speechTab:
conversationMode: false
advancedMode: false
speechToText: true
textToSpeech:
engineTTS: "openai"
voice: "nova"
automaticPlayback: falseNotes
- Always use environment variable references (e.g.,
${API_KEY}) for API keys in configuration files - Multiple TTS providers can be configured; users select their preferred option in the UI
- The
speechTabsettings define defaults that users can override in their personal settings - For detailed feature documentation, see Speech to Text & Text to Speech