Speech Settings
Configuration of the Speech to Text (STT) and Text to Speech (TTS) features
Speech Introduction
The Speech Configuration includes settings for both Speech-to-Text (STT) and Text-to-Speech (TTS) under a unified speech: section. Additionally, there is a new speechTab menu for user-specific settings.
See Also: For detailed YAML configuration schema and all available options, see the Speech Object Structure documentation.
Environment Variables
When using cloud-based STT/TTS services, you'll need to set API keys in your .env file:
These keys are then referenced in your librechat.yaml configuration using ${STT_API_KEY} and ${TTS_API_KEY}.
Speech Tab (optional)
The speechTab menu provides customizable options for conversation and advanced modes, as well as detailed settings for STT and TTS. This will set the default settings for users
example:
STT (Speech-to-Text)
The Speech-to-Text (STT) feature converts spoken words into written text. To enable STT, click on the STT button (near the send button) or use the key combination ++Ctrl+Alt+L++ to start the transcription.
Available STT Services
- Local STT
- Browser-based
- Whisper (tested on LocalAI)
- Cloud STT
- OpenAI Whisper
- Azure Whisper
- Other OpenAI-compatible STT services
Configuring Local STT
-
Browser-based
No setup required. Ensure the "Speech To Text" switch in the speech settings tab is enabled and "Browser" is selected in the engine dropdown.
-
Whisper Local
Requires a local Whisper instance.
Configuring Cloud STT
Azure Endpoint Domain Support
The instanceName field supports both Azure OpenAI domain formats:
- New format:
.cognitiveservices.azure.com(e.g.,my-instance.cognitiveservices.azure.com) - Legacy format:
.openai.azure.com(e.g.,my-instance.openai.azure.com)
You can specify either the full domain or just the instance name. If you provide a full domain including .azure.com, it will be used as-is. Otherwise, the legacy .openai.azure.com format will be applied for backward compatibility.
Refer to the OpenAI Whisper section, adjusting the url and model as needed.
example
TTS (Text-to-Speech)
The Text-to-Speech (TTS) feature converts written text into spoken words. Various TTS services are available:
Available TTS Services
- Local TTS
- Browser-based
- Piper (tested on LocalAI)
- Coqui (tested on LocalAI)
- Cloud TTS
- OpenAI TTS
- Azure OpenAI
- ElevenLabs
- Other OpenAI/ElevenLabs-compatible TTS services
Configuring Local TTS
No setup required. Ensure the "Text To Speech" switch in the speech settings tab is enabled and "Browser" is selected in the engine dropdown.
Requires a local Piper instance.
Requires a local Coqui instance.
Configuring Cloud TTS
Azure Endpoint Domain Support
The instanceName field supports both Azure OpenAI domain formats:
- New format:
.cognitiveservices.azure.com(e.g.,my-instance.cognitiveservices.azure.com) - Legacy format:
.openai.azure.com(e.g.,my-instance.openai.azure.com)
You can specify either the full domain or just the instance name. If you provide a full domain including .azure.com, it will be used as-is. Otherwise, the legacy .openai.azure.com format will be applied for backward compatibility.
Additional ElevenLabs-specific parameters can be added as follows:
Refer to the OpenAI TTS section, adjusting the url variable as needed
example:
Refer to the ElevenLabs section, adjusting the url variable as needed
example:
How is this guide?