Docs
⚙️ Configuration
librechat.yaml
Settings
Speech (TTS/STT)

Speech Configuration

Overview

The speech object allows you to configure Text-to-Speech (TTS) and Speech-to-Text (STT) providers directly in your librechat.yaml configuration file. This enables server-side speech services without requiring users to configure their own API keys.

Fields under speech:

  • tts - Text-to-Speech provider configurations
  • stt - Speech-to-Text provider configurations
  • speechTab - Default UI settings for speech features

Notes:

  • Multiple providers can be configured simultaneously
  • Users can select their preferred provider from the available options
  • API keys in the config file should use environment variable references for security

Example

speech
speech:
  tts:
    openai:
      apiKey: "${TTS_API_KEY}"
      model: "tts-1"
      voices: ["alloy", "echo", "fable", "onyx", "nova", "shimmer"]
    elevenlabs:
      apiKey: "${ELEVENLABS_API_KEY}"
      model: "eleven_multilingual_v2"
      voices: ["voice-id-1", "voice-id-2"]
  stt:
    openai:
      apiKey: "${STT_API_KEY}"
      model: "whisper-1"
  speechTab:
    conversationMode: true
    advancedMode: false
    speechToText: true
    textToSpeech: true

tts

The tts object configures Text-to-Speech providers. Multiple providers can be configured, and users can choose which one to use.

openai

OpenAI TTS configuration using models like tts-1 or tts-1-hd.

KeyTypeDescriptionExample
urlStringCustom API URL (optional). Use for OpenAI-compatible endpoints.
apiKeyStringOpenAI API key. Use environment variable reference.Required
modelStringTTS model to use (e.g., "tts-1", "tts-1-hd").Required
voicesArray of StringsAvailable voice options for users to select.Required

Example:

speech / tts / openai
tts:
  openai:
    apiKey: "${TTS_API_KEY}"
    model: "tts-1"
    voices: ["alloy", "echo", "fable", "onyx", "nova", "shimmer"]

azureOpenAI

Azure OpenAI TTS configuration.

KeyTypeDescriptionExample
instanceNameStringAzure OpenAI instance name.Required
apiKeyStringAzure OpenAI API key.Required
deploymentNameStringThe deployment name for the TTS model.Required
apiVersionStringAzure OpenAI API version.Required
modelStringTTS model identifier.Required
voicesArray of StringsAvailable voice options.Required

Example:

speech / tts / azureOpenAI
tts:
  azureOpenAI:
    instanceName: "my-azure-instance"
    apiKey: "${AZURE_TTS_API_KEY}"
    deploymentName: "tts-deployment"
    apiVersion: "2024-02-15-preview"
    model: "tts-1"
    voices: ["alloy", "echo", "nova"]

elevenlabs

ElevenLabs TTS configuration for high-quality voice synthesis.

KeyTypeDescriptionExample
urlStringCustom API URL (optional).
websocketUrlStringWebSocket URL for streaming (optional).
apiKeyStringElevenLabs API key.Required
modelStringElevenLabs model (e.g., "eleven_multilingual_v2").Required
voicesArray of StringsVoice IDs available for selection.Required
voice_settingsObjectVoice customization settings (optional).
pronunciation_dictionary_locatorsArray of StringsPronunciation dictionary IDs (optional).

voice_settings Sub-keys:

KeyTypeDescriptionExample
similarity_boostNumberVoice similarity enhancement (0-1).
stabilityNumberVoice stability (0-1).
styleNumberStyle exaggeration (0-1).
use_speaker_boostBooleanEnable speaker boost.

Example:

speech / tts / elevenlabs
tts:
  elevenlabs:
    apiKey: "${ELEVENLABS_API_KEY}"
    model: "eleven_multilingual_v2"
    voices: ["21m00Tcm4TlvDq8ikWAM", "AZnzlk1XvdvUeBnXmlld"]
    voice_settings:
      stability: 0.5
      similarity_boost: 0.75
      use_speaker_boost: true

localai

LocalAI TTS configuration for self-hosted speech synthesis.

KeyTypeDescriptionExample
urlStringLocalAI server URL.Required
apiKeyStringAPI key if authentication is enabled (optional).
voicesArray of StringsAvailable voice models.Required
backendStringTTS backend to use (e.g., "piper").Required

Example:

speech / tts / localai
tts:
  localai:
    url: "http://localhost:8080"
    voices: ["en-us-amy-low", "en-us-danny-low"]
    backend: "piper"

stt

The stt object configures Speech-to-Text providers.

openai

OpenAI Whisper STT configuration.

KeyTypeDescriptionExample
urlStringCustom API URL (optional). Use for OpenAI-compatible endpoints.
apiKeyStringOpenAI API key. Use environment variable reference.Required
modelStringSTT model to use (e.g., "whisper-1").Required

Example:

speech / stt / openai
stt:
  openai:
    apiKey: "${STT_API_KEY}"
    model: "whisper-1"

azureOpenAI

Azure OpenAI Whisper STT configuration.

KeyTypeDescriptionExample
instanceNameStringAzure OpenAI instance name.Required
apiKeyStringAzure OpenAI API key.Required
deploymentNameStringThe deployment name for the Whisper model.Required
apiVersionStringAzure OpenAI API version.Required

Example:

speech / stt / azureOpenAI
stt:
  azureOpenAI:
    instanceName: "my-azure-instance"
    apiKey: "${AZURE_STT_API_KEY}"
    deploymentName: "whisper-deployment"
    apiVersion: "2024-02-15-preview"

speechTab

The speechTab object configures default UI settings for speech features. These settings control what users see by default in the speech settings panel.

KeyTypeDescriptionExample
conversationModeBooleanEnable conversation mode by default.false
advancedModeBooleanShow advanced speech settings by default.false
speechToTextBoolean or ObjectEnable STT by default, or configure detailed STT settings.false
textToSpeechBoolean or ObjectEnable TTS by default, or configure detailed TTS settings.false

speechToText (Object format)

When using an object instead of a boolean:

KeyTypeDescriptionExample
engineSTTStringDefault STT engine.
languageSTTStringDefault language for STT.
autoTranscribeAudioBooleanAutomatically transcribe audio messages.
decibelValueNumberDecibel threshold for voice detection.
autoSendTextNumberDelay in ms before auto-sending transcribed text (0 to disable).

textToSpeech (Object format)

When using an object instead of a boolean:

KeyTypeDescriptionExample
engineTTSStringDefault TTS engine.
voiceStringDefault voice selection.
languageTTSStringDefault language for TTS.
automaticPlaybackBooleanAutomatically play TTS responses.
playbackRateNumberDefault playback speed (1.0 = normal).
cacheTTSBooleanCache TTS audio for repeated playback.

Example:

speech / speechTab
speechTab:
  conversationMode: false
  advancedMode: false
  speechToText:
    engineSTT: "openai"
    autoTranscribeAudio: true
    decibelValue: -45
  textToSpeech:
    engineTTS: "openai"
    voice: "nova"
    automaticPlayback: false
    playbackRate: 1.0
    cacheTTS: true

Complete Example

librechat.yaml
version: 1.2.9
cache: true
 
speech:
  tts:
    openai:
      apiKey: "${TTS_API_KEY}"
      model: "tts-1-hd"
      voices: ["alloy", "echo", "fable", "onyx", "nova", "shimmer"]
    elevenlabs:
      apiKey: "${ELEVENLABS_API_KEY}"
      model: "eleven_multilingual_v2"
      voices: ["21m00Tcm4TlvDq8ikWAM", "AZnzlk1XvdvUeBnXmlld"]
      voice_settings:
        stability: 0.5
        similarity_boost: 0.75
  stt:
    openai:
      apiKey: "${STT_API_KEY}"
      model: "whisper-1"
  speechTab:
    conversationMode: false
    advancedMode: false
    speechToText: true
    textToSpeech:
      engineTTS: "openai"
      voice: "nova"
      automaticPlayback: false

Notes

  • Always use environment variable references (e.g., ${API_KEY}) for API keys in configuration files
  • Multiple TTS providers can be configured; users select their preferred option in the UI
  • The speechTab settings define defaults that users can override in their personal settings
  • For detailed feature documentation, see Speech to Text & Text to Speech