Speech to Text (STT) and Text to Speech (TTS)

ℹ️

Upcoming STT/TTS Enhancements▷

The Google Cloud STT/TTS and Deepgram services are being planned for future integration.

Speech Introduction

The Speech Configuration includes settings for both Speech-to-Text (STT) and Text-to-Speech (TTS) under a unified speech: section. Additionally, there is a new speechTab menu for user-specific settings

Speech Tab (optional)

The speechTab menu provides customizable options for conversation and advanced modes, as well as detailed settings for STT and TTS. This will set the default settings for users

example:

speech:
  speechTab:
    conversationMode: true
    advancedMode: false
    speechToText:
      engineSTT: "external"
      languageSTT: "English (US)"
      autoTranscribeAudio: true
      decibelValue: -45
      autoSendText: 0
    textToSpeech:
      engineTTS: "external"
      voice: "alloy"
      languageTTS: "en"
      automaticPlayback: true
      playbackRate: 1.0
      cacheTTS: true

STT (Speech-to-Text)

The Speech-to-Text (STT) feature converts spoken words into written text. To enable STT, click on the STT button (near the send button) or use the key combination ++Ctrl+Alt+L++ to start the transcription.

Available STT Services

Local STT
- Browser-based
- Whisper (tested on LocalAI)
Cloud STT
- OpenAI Whisper
- Azure Whisper
- Other OpenAI-compatible STT services

Configuring Local STT

Browser-based

No setup required. Ensure the “Speech To Text” switch in the speech settings tab is enabled and “Browser” is selected in the engine dropdown.
Whisper Local

Requires a local Whisper instance.

speech:
  stt:
    openai:
      url: 'http://host.docker.internal:8080/v1/audio/transcriptions'
      model: 'whisper'

Configuring Cloud STT

OpenAI Whisper

speech:
  stt:
    openai:
      apiKey: '${STT_API_KEY}'
      model: 'whisper-1'

Azure Whisper

speech:
  stt:
    azureOpenAI:
      instanceName: 'instanceName'
      apiKey: '${STT_API_KEY}'
      deploymentName: 'deploymentName'
      apiVersion: 'apiVersion'

OpenAI compatible

Refer to the OpenAI Whisper section, adjusting the url and model as needed.

example

speech:
  stt:
    openai:
      url: 'http://host.docker.internal:8080/v1/audio/transcriptions'
      model: 'whisper'

TTS (Text-to-Speech)

The Text-to-Speech (TTS) feature converts written text into spoken words. Various TTS services are available:

Available TTS Services

Local TTS
- Browser-based
- Piper (tested on LocalAI)
- Coqui (tested on LocalAI)
Cloud TTS
- OpenAI TTS
- Azure OpenAI
- ElevenLabs
- Other OpenAI/ElevenLabs-compatible TTS services

Configuring Local TTS

Browser-based

No setup required. Ensure the “Text To Speech” switcg in the speech settings tab is enabled and “Browser” is selected in the engine dropdown.

Piper

Requires a local Piper instance.

speech:
  tts:
    localai:
      url: "http://host.docker.internal:8080/tts"
      apiKey: "EMPTY"
      voices: [
        "en-us-amy-low.onnx",
        "en-us-danny-low.onnx",
        "en-us-libritts-high.onnx",
        "en-us-ryan-high.onnx",
      ]
      backend: "piper"

Coqui

Requires a local Coqui instance.

speech:
  tts:
    localai:
      url: 'http://localhost:8080/v1/audio/synthesize'
      voices: ['tts_models/en/ljspeech/glow-tts', 'tts_models/en/ljspeech/tacotron2', 'tts_models/en/ljspeech/waveglow']
      backend: 'coqui'

Configuring Cloud TTS

OpenAI TTS

speech:
  tts:
    openai:
      apiKey: '${TTS_API_KEY}'
      model: 'tts-1'
      voices: ['alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer']

Azure OpenAI

speech:
  tts:
    azureOpenAI:
      instanceName: ''
      apiKey: '${TTS_API_KEY}'
      deploymentName: ''
      apiVersion: ''
      model: 'tts-1'
      voices: ['alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer']

ElevenLabs

speech:
  tts:
    elevenlabs:
      apiKey: '${TTS_API_KEY}'
      model: 'eleven_multilingual_v2'
      voices: ['202898wioas09d2', 'addwqr324tesfsf', '3asdasr3qrq44w', 'adsadsa']

Additional ElevenLabs-specific parameters can be added as follows:

      voice_settings:
        similarity_boost: '' # number
        stability: '' # number
        style: '' # number
        use_speaker_boost: # boolean
      pronunciation_dictionary_locators: [''] # list of strings (array)

OpenAI compatible

Refer to the OpenAI TTS section, adjusting the url variable as needed

example:

speech:
  tts:
    openai:
      url: 'http://host.docker.internal:8080/v1/audio/synthesize'
      apiKey: '${TTS_API_KEY}'
      model: 'tts-1'
      voices: ['alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer']

ElevenLabs compatible

Refer to the ElevenLabs section, adjusting the url variable as needed

example:

speech:
  tts:
    elevenlabs:
      url: 'http://host.docker.internal:8080/v1/audio/synthesize'
      apiKey: '${TTS_API_KEY}'
      model: 'eleven_multilingual_v2'
      voices: ['202898wioas09d2', 'addwqr324tesfsf', '3asdasr3qrq44w', 'adsadsa']

RAG API Token Usage

Speech to Text (STT) and Text to Speech (TTS)

Speech Introduction

Speech Tab (optional)

STT (Speech-to-Text)

Available STT Services

Configuring Local STT

Browser-based

Whisper Local

Configuring Cloud STT

OpenAI Whisper

Azure Whisper

OpenAI compatible

TTS (Text-to-Speech)

Available TTS Services

Configuring Local TTS

Browser-based

Piper

Coqui

Configuring Cloud TTS

OpenAI TTS

Azure OpenAI

ElevenLabs

OpenAI compatible

ElevenLabs compatible