Speech Configuration

Overview

The speech object allows you to configure Text-to-Speech (TTS) and Speech-to-Text (STT) providers directly in your librechat.yaml configuration file. This enables server-side speech services without requiring users to configure their own API keys.

Fields under speech:

tts - Text-to-Speech provider configurations
stt - Speech-to-Text provider configurations
speechTab - Default UI settings for speech features

Notes:

Multiple providers can be configured simultaneously
Users can select their preferred provider from the available options
API keys in the config file should use environment variable references for security

Example

speech:
  tts:
    openai:
      apiKey: "${TTS_API_KEY}"
      model: "tts-1"
      voices: ["alloy", "echo", "fable", "onyx", "nova", "shimmer"]
    elevenlabs:
      apiKey: "${ELEVENLABS_API_KEY}"
      model: "eleven_multilingual_v2"
      voices: ["voice-id-1", "voice-id-2"]
  stt:
    openai:
      apiKey: "${STT_API_KEY}"
      model: "whisper-1"
  speechTab:
    conversationMode: true
    advancedMode: false
    speechToText: true
    textToSpeech: true

tts

The tts object configures Text-to-Speech providers. Multiple providers can be configured, and users can choose which one to use.

openai

OpenAI TTS configuration using models like tts-1 or tts-1-hd.

Key	Type	Description	Example
url	String	Custom API URL (optional). Use for OpenAI-compatible endpoints.
apiKey	String	OpenAI API key. Use environment variable reference.	Required
model	String	TTS model to use (e.g., "tts-1", "tts-1-hd").	Required
voices	Array of Strings	Available voice options for users to select.	Required

Example:

tts:
  openai:
    apiKey: "${TTS_API_KEY}"
    model: "tts-1"
    voices: ["alloy", "echo", "fable", "onyx", "nova", "shimmer"]

azureOpenAI

Azure OpenAI TTS configuration.

Key	Type	Description	Example
instanceName	String	Azure OpenAI instance name.	Required
apiKey	String	Azure OpenAI API key.	Required
deploymentName	String	The deployment name for the TTS model.	Required
apiVersion	String	Azure OpenAI API version.	Required
model	String	TTS model identifier.	Required
voices	Array of Strings	Available voice options.	Required

Example:

tts:
  azureOpenAI:
    instanceName: "my-azure-instance"
    apiKey: "${AZURE_TTS_API_KEY}"
    deploymentName: "tts-deployment"
    apiVersion: "2024-02-15-preview"
    model: "tts-1"
    voices: ["alloy", "echo", "nova"]

elevenlabs

ElevenLabs TTS configuration for high-quality voice synthesis.

Key	Type	Description	Example
url	String	Custom API URL (optional).
websocketUrl	String	WebSocket URL for streaming (optional).
apiKey	String	ElevenLabs API key.	Required
model	String	ElevenLabs model (e.g., "eleven_multilingual_v2").	Required
voices	Array of Strings	Voice IDs available for selection.	Required
voice_settings	Object	Voice customization settings (optional).
pronunciation_dictionary_locators	Array of Strings	Pronunciation dictionary IDs (optional).

voice_settings Sub-keys:

Key	Type	Description
similarity_boost	Number	Voice similarity enhancement (0-1).
stability	Number	Voice stability (0-1).
style	Number	Style exaggeration (0-1).
use_speaker_boost	Boolean	Enable speaker boost.

Example:

tts:
  elevenlabs:
    apiKey: "${ELEVENLABS_API_KEY}"
    model: "eleven_multilingual_v2"
    voices: ["21m00Tcm4TlvDq8ikWAM", "AZnzlk1XvdvUeBnXmlld"]
    voice_settings:
      stability: 0.5
      similarity_boost: 0.75
      use_speaker_boost: true

localai

LocalAI TTS configuration for self-hosted speech synthesis.

Key	Type	Description	Example
url	String	LocalAI server URL.	Required
apiKey	String	API key if authentication is enabled (optional).
voices	Array of Strings	Available voice models.	Required
backend	String	TTS backend to use (e.g., "piper").	Required

Example:

tts:
  localai:
    url: "http://localhost:8080"
    voices: ["en-us-amy-low", "en-us-danny-low"]
    backend: "piper"

stt

The stt object configures Speech-to-Text providers.

openai

OpenAI Whisper STT configuration.

Key	Type	Description	Example
url	String	Custom API URL (optional). Use for OpenAI-compatible endpoints.
apiKey	String	OpenAI API key. Use environment variable reference.	Required
model	String	STT model to use (e.g., "whisper-1").	Required

Example:

stt:
  openai:
    apiKey: "${STT_API_KEY}"
    model: "whisper-1"

azureOpenAI

Azure OpenAI Whisper STT configuration.

Key	Type	Description	Example
instanceName	String	Azure OpenAI instance name.	Required
apiKey	String	Azure OpenAI API key.	Required
deploymentName	String	The deployment name for the Whisper model.	Required
apiVersion	String	Azure OpenAI API version.	Required

Example:

stt:
  azureOpenAI:
    instanceName: "my-azure-instance"
    apiKey: "${AZURE_STT_API_KEY}"
    deploymentName: "whisper-deployment"
    apiVersion: "2024-02-15-preview"

speechTab

The speechTab object configures default UI settings for speech features. These settings control what users see by default in the speech settings panel.

Key	Type	Description	Example
conversationMode	Boolean	Enable conversation mode by default.	false
advancedMode	Boolean	Show advanced speech settings by default.	false
speechToText	Boolean or Object	Enable STT by default, or configure detailed STT settings.	false
textToSpeech	Boolean or Object	Enable TTS by default, or configure detailed TTS settings.	false

speechToText (Object format)

When using an object instead of a boolean:

Key	Type	Description
engineSTT	String	Default STT engine.
languageSTT	String	Default language for STT.
autoTranscribeAudio	Boolean	Automatically transcribe audio messages.
decibelValue	Number	Decibel threshold for voice detection.
autoSendText	Number	Delay in ms before auto-sending transcribed text (0 to disable).

textToSpeech (Object format)

When using an object instead of a boolean:

Key	Type	Description
engineTTS	String	Default TTS engine.
voice	String	Default voice selection.
languageTTS	String	Default language for TTS.
automaticPlayback	Boolean	Automatically play TTS responses.
playbackRate	Number	Default playback speed (1.0 = normal).
cacheTTS	Boolean	Cache TTS audio for repeated playback.

Example:

speechTab:
  conversationMode: false
  advancedMode: false
  speechToText:
    engineSTT: "openai"
    autoTranscribeAudio: true
    decibelValue: -45
  textToSpeech:
    engineTTS: "openai"
    voice: "nova"
    automaticPlayback: false
    playbackRate: 1.0
    cacheTTS: true

Complete Example

version: 1.2.9
cache: true
 
speech:
  tts:
    openai:
      apiKey: "${TTS_API_KEY}"
      model: "tts-1-hd"
      voices: ["alloy", "echo", "fable", "onyx", "nova", "shimmer"]
    elevenlabs:
      apiKey: "${ELEVENLABS_API_KEY}"
      model: "eleven_multilingual_v2"
      voices: ["21m00Tcm4TlvDq8ikWAM", "AZnzlk1XvdvUeBnXmlld"]
      voice_settings:
        stability: 0.5
        similarity_boost: 0.75
  stt:
    openai:
      apiKey: "${STT_API_KEY}"
      model: "whisper-1"
  speechTab:
    conversationMode: false
    advancedMode: false
    speechToText: true
    textToSpeech:
      engineTTS: "openai"
      voice: "nova"
      automaticPlayback: false

Notes

Always use environment variable references (e.g., ${API_KEY}) for API keys in configuration files
Multiple TTS providers can be configured; users select their preferred option in the UI
The speechTab settings define defaults that users can override in their personal settings
For detailed feature documentation, see Speech to Text & Text to Speech

Speech Configuration

On this page