# Speech Configuration (/docs/configuration/librechat_yaml/object_structure/speech)

## Overview

The `speech` object allows you to configure Text-to-Speech (TTS) and Speech-to-Text (STT) providers directly in your `librechat.yaml` configuration file. This enables server-side speech services without requiring users to configure their own API keys.

**Fields under `speech`:**

- `tts` - Text-to-Speech provider configurations
- `stt` - Speech-to-Text provider configurations
- `speechTab` - Default UI settings for speech features

**Notes:**

- Multiple providers can be configured simultaneously
- Users can select their preferred provider from the available options
- API keys in the config file should use environment variable references for security

## Example

```yaml filename="speech"
speech:
  tts:
    openai:
      apiKey: "${TTS_API_KEY}"
      model: "tts-1"
      voices: ["alloy", "echo", "fable", "onyx", "nova", "shimmer"]
    elevenlabs:
      apiKey: "${ELEVENLABS_API_KEY}"
      model: "eleven_multilingual_v2"
      voices: ["voice-id-1", "voice-id-2"]
  stt:
    openai:
      apiKey: "${STT_API_KEY}"
      model: "whisper-1"
  speechTab:
    conversationMode: true
    advancedMode: false
    speechToText: true
    textToSpeech: true
```

---

## tts

The `tts` object configures Text-to-Speech providers. Multiple providers can be configured, and users can choose which one to use.

### openai

OpenAI TTS configuration using models like `tts-1` or `tts-1-hd`.

<OptionTable
  options={[
    ['url', 'String', 'Custom API URL (optional). Use for OpenAI-compatible endpoints.', ''],
    ['apiKey', 'String', 'OpenAI API key. Use environment variable reference.', 'Required'],
    ['model', 'String', 'TTS model to use (e.g., "tts-1", "tts-1-hd").', 'Required'],
    ['voices', 'Array of Strings', 'Available voice options for users to select.', 'Required'],
  ]}
/>

**Example:**
```yaml filename="speech / tts / openai"
tts:
  openai:
    apiKey: "${TTS_API_KEY}"
    model: "tts-1"
    voices: ["alloy", "echo", "fable", "onyx", "nova", "shimmer"]
```

### azureOpenAI

Azure OpenAI TTS configuration.

<OptionTable
  options={[
    ['instanceName', 'String', 'Azure OpenAI instance name.', 'Required'],
    ['apiKey', 'String', 'Azure OpenAI API key.', 'Required'],
    ['deploymentName', 'String', 'The deployment name for the TTS model.', 'Required'],
    ['apiVersion', 'String', 'Azure OpenAI API version.', 'Required'],
    ['model', 'String', 'TTS model identifier.', 'Required'],
    ['voices', 'Array of Strings', 'Available voice options.', 'Required'],
  ]}
/>

**Example:**
```yaml filename="speech / tts / azureOpenAI"
tts:
  azureOpenAI:
    instanceName: "my-azure-instance"
    apiKey: "${AZURE_TTS_API_KEY}"
    deploymentName: "tts-deployment"
    apiVersion: "2024-02-15-preview"
    model: "tts-1"
    voices: ["alloy", "echo", "nova"]
```

### elevenlabs

ElevenLabs TTS configuration for high-quality voice synthesis.

<OptionTable
  options={[
    ['url', 'String', 'Custom API URL (optional).', ''],
    ['websocketUrl', 'String', 'WebSocket URL for streaming (optional).', ''],
    ['apiKey', 'String', 'ElevenLabs API key.', 'Required'],
    ['model', 'String', 'ElevenLabs model (e.g., "eleven_multilingual_v2").', 'Required'],
    ['voices', 'Array of Strings', 'Voice IDs available for selection.', 'Required'],
    ['voice_settings', 'Object', 'Voice customization settings (optional).', ''],
    ['pronunciation_dictionary_locators', 'Array of Strings', 'Pronunciation dictionary IDs (optional).', ''],
  ]}
/>

**voice_settings Sub-keys:**
<OptionTable
  options={[
    ['similarity_boost', 'Number', 'Voice similarity enhancement (0-1).', ''],
    ['stability', 'Number', 'Voice stability (0-1).', ''],
    ['style', 'Number', 'Style exaggeration (0-1).', ''],
    ['use_speaker_boost', 'Boolean', 'Enable speaker boost.', ''],
  ]}
/>

**Example:**
```yaml filename="speech / tts / elevenlabs"
tts:
  elevenlabs:
    apiKey: "${ELEVENLABS_API_KEY}"
    model: "eleven_multilingual_v2"
    voices: ["21m00Tcm4TlvDq8ikWAM", "AZnzlk1XvdvUeBnXmlld"]
    voice_settings:
      stability: 0.5
      similarity_boost: 0.75
      use_speaker_boost: true
```

### localai

LocalAI TTS configuration for self-hosted speech synthesis.

<OptionTable
  options={[
    ['url', 'String', 'LocalAI server URL.', 'Required'],
    ['apiKey', 'String', 'API key if authentication is enabled (optional).', ''],
    ['voices', 'Array of Strings', 'Available voice models.', 'Required'],
    ['backend', 'String', 'TTS backend to use (e.g., "piper").', 'Required'],
  ]}
/>

**Example:**
```yaml filename="speech / tts / localai"
tts:
  localai:
    url: "http://localhost:8080"
    voices: ["en-us-amy-low", "en-us-danny-low"]
    backend: "piper"
```

---

## stt

The `stt` object configures Speech-to-Text providers.

### openai

OpenAI Whisper STT configuration.

<OptionTable
  options={[
    ['url', 'String', 'Custom API URL (optional). Use for OpenAI-compatible endpoints.', ''],
    ['apiKey', 'String', 'OpenAI API key. Use environment variable reference.', 'Required'],
    ['model', 'String', 'STT model to use (e.g., "whisper-1").', 'Required'],
  ]}
/>

**Example:**
```yaml filename="speech / stt / openai"
stt:
  openai:
    apiKey: "${STT_API_KEY}"
    model: "whisper-1"
```

### azureOpenAI

Azure OpenAI Whisper STT configuration.

<OptionTable
  options={[
    ['instanceName', 'String', 'Azure OpenAI instance name.', 'Required'],
    ['apiKey', 'String', 'Azure OpenAI API key.', 'Required'],
    ['deploymentName', 'String', 'The deployment name for the Whisper model.', 'Required'],
    ['apiVersion', 'String', 'Azure OpenAI API version.', 'Required'],
  ]}
/>

**Example:**
```yaml filename="speech / stt / azureOpenAI"
stt:
  azureOpenAI:
    instanceName: "my-azure-instance"
    apiKey: "${AZURE_STT_API_KEY}"
    deploymentName: "whisper-deployment"
    apiVersion: "2024-02-15-preview"
```

---

## speechTab

The `speechTab` object configures default UI settings for speech features. These settings control what users see by default in the speech settings panel.

<OptionTable
  options={[
    ['conversationMode', 'Boolean', 'Enable conversation mode by default.', 'false'],
    ['advancedMode', 'Boolean', 'Show advanced speech settings by default.', 'false'],
    ['speechToText', 'Boolean or Object', 'Enable STT by default, or configure detailed STT settings.', 'false'],
    ['textToSpeech', 'Boolean or Object', 'Enable TTS by default, or configure detailed TTS settings.', 'false'],
  ]}
/>

### speechToText (Object format)

When using an object instead of a boolean:

<OptionTable
  options={[
    ['engineSTT', 'String', 'Default STT engine.', ''],
    ['languageSTT', 'String', 'Default language for STT.', ''],
    ['autoTranscribeAudio', 'Boolean', 'Automatically transcribe audio messages.', ''],
    ['decibelValue', 'Number', 'Decibel threshold for voice detection.', ''],
    ['autoSendText', 'Number', 'Delay in ms before auto-sending transcribed text (0 to disable).', ''],
  ]}
/>

### textToSpeech (Object format)

When using an object instead of a boolean:

<OptionTable
  options={[
    ['engineTTS', 'String', 'Default TTS engine.', ''],
    ['voice', 'String', 'Default voice selection.', ''],
    ['languageTTS', 'String', 'Default language for TTS.', ''],
    ['automaticPlayback', 'Boolean', 'Automatically play TTS responses.', ''],
    ['playbackRate', 'Number', 'Default playback speed (1.0 = normal).', ''],
    ['cacheTTS', 'Boolean', 'Cache TTS audio for repeated playback.', ''],
  ]}
/>

**Example:**
```yaml filename="speech / speechTab"
speechTab:
  conversationMode: false
  advancedMode: false
  speechToText:
    engineSTT: "openai"
    autoTranscribeAudio: true
    decibelValue: -45
  textToSpeech:
    engineTTS: "openai"
    voice: "nova"
    automaticPlayback: false
    playbackRate: 1.0
    cacheTTS: true
```

---

## Complete Example

```yaml filename="librechat.yaml"
version: 1.2.9
cache: true

speech:
  tts:
    openai:
      apiKey: "${TTS_API_KEY}"
      model: "tts-1-hd"
      voices: ["alloy", "echo", "fable", "onyx", "nova", "shimmer"]
    elevenlabs:
      apiKey: "${ELEVENLABS_API_KEY}"
      model: "eleven_multilingual_v2"
      voices: ["21m00Tcm4TlvDq8ikWAM", "AZnzlk1XvdvUeBnXmlld"]
      voice_settings:
        stability: 0.5
        similarity_boost: 0.75
  stt:
    openai:
      apiKey: "${STT_API_KEY}"
      model: "whisper-1"
  speechTab:
    conversationMode: false
    advancedMode: false
    speechToText: true
    textToSpeech:
      engineTTS: "openai"
      voice: "nova"
      automaticPlayback: false
```

---

## Notes

- Always use environment variable references (e.g., `${API_KEY}`) for API keys in configuration files
- Multiple TTS providers can be configured; users select their preferred option in the UI
- The `speechTab` settings define defaults that users can override in their personal settings
- For detailed feature documentation, see [Speech to Text & Text to Speech](/docs/configuration/stt_tts)
