Docs
Configuration
Speech Settings

Speech to Text (STT) and Text to Speech (TTS)

ℹ️
Upcoming STT/TTS Enhancements

The Google cloud STT/TTS and Deepgram services are beeing planned to add in the future

STT

The Speech-to-Text (STT) feature allows you to convert spoken words into written text. To enable the STT (already configured), click on the STT button (near the send button) and start speaking. Otherwise, you can also use the key combination: ++Ctrl+Alt+L++ to start the transcription.

There are many different STT services available, but here’s a list of some of the most popular ones:

Local STT

  • Browser-based
  • Whisper (tested on LocalAI and HomeAssistant)

Cloud STT

  • OpenAI Whisper (via API calls)
  • Azure Whisper (via API calls)
  • All the other OpenAI compatible STT services (via API calls)

Browser-based

No setup required, just click make sure that the “STT button” in the speech settings tab is enabled and in the engine dropdown “Browser” is selected. When clicking the button, the browser will ask for permission to use the microphone. Once permission is granted, you can start speaking and the text will be displayed in the chat window in real-time. When you’re done speaking, click the button again to stop the transcription or wait for the timeout to stop the transcription automatically

Whisper local

⚠️
Compatibility Testing

Whisper local has been tested only on LocalAI and HomeAssistant’s whisper docker image, but it should work on any other local whisper instance

To use the Whisper local STT service, you need to have a local whisper instance running. You can find more information on how to set up a local whisper instance with LocalAI in the LocalAI’s documentation. Once you have a local whisper instance running, you can configure the STT service as followed:

in the librechat.yaml add this configuration:

stt:
  openai:
    url: 'http://host.docker.internal:8080/v1/audio/transcriptions'
    model: 'whisper'

where, url it’s the url of the whisper instance, the apiKey points to the .env and model is the model that you want to use for the transcription

OpenAI Whisper

Create an OpenAI api key at OpenAI’s website

Then, in the librechat.yaml file, add the following configuration:

stt:
  openai:
    apiKey: '${STT_API_KEY}'
    model: 'whisper-1'
📔
Understanding Guide

if you want to understand more about these variables check the Whisper local section

Azure Whisper (WIP)

in the librechat.yaml file, add the following configuration to your already existing Azure configuration:

don't have an Azure configuration yet?

if you don’t have one, you can find more information on how to set up an Azure STT service in the Azure’s documentation

librechat.yaml
models:
  whisper:
    deploymentName: whisper-01
📔
Understanding Guide

if you want to understand more about these variables check the Whisper local section

OpenAI compatible STT services

check the OpenAI Whisper section, just change the url and model variables to the ones that you want to use

TTS

The Text-to-Speech (TTS) feature allows you to convert written text into spoken words. There are many different TTS services available, but here’s a list of some of the most popular ones:

Local TTS

  • Browser-based
  • Piper (tested on LocalAI)
  • Coqui (tested on LocalAI)

Cloud TTS

  • OpenAI TTS
  • ElevenLabs
  • All the other OpenAI compatible TTS services

Browser-based

No setup required, just click make sure that the “TTS button” in the speech settings tab is enabled and in the engine dropdown “Browser” is selected. When clicking the button, it will start speaking, click the button again to stop the speech or wait for the speech to finish

Piper

⚠️
Compatibility Testing

Piper has been tested only on LocalAI, but it should work on any other local piper instance

To use the Piper local TTS service, you need to have a local piper instance running. You can find more information on how to set up a local piper instance with LocalAI in the LocalAI’s documentation. Once you have a local piper instance running, you can configure the TTS service as followed:

In the librechat.yaml add this configuration:

librechat.yaml
tts:
  localai:
    url: "http://host.docker.internal:8080/tts"
    apiKey: "EMPTY"
    voices: [
      "en-us-amy-low.onnx",
      "en-us-danny-low.onnx",
      "en-us-libritts-high.onnx",
      "en-us-ryan-high.onnx",
      ]
    backend: "piper"

Voices are just an example, you can find more information about the voices in the LocalAI’s documentation

Coqui

⚠️
Compatibility Testing

Coqui has been tested only on LocalAI, but it should work on any other local coqui instance

To use the Coqui local TTS service, you need to have a local coqui instance running. You can find more information on how to set up a local coqui instance with LocalAI in the LocalAI’s documentation. Once you have a local coqui instance running, you can configure the TTS service as followed:

in the librechat.yaml add this configuration:

librechat.yaml
tts:
  localai:
    url: 'http://localhost:8080/v1/audio/synthesize'
    voices: ['tts_models/en/ljspeech/glow-tts', 'tts_models/en/ljspeech/tacotron2', 'tts_models/en/ljspeech/waveglow']
    backend: 'coqui'

voices are just an example, you can find more information about the voices in the LocalAI’s documentation

OpenAI TTS

Create an OpenAI api key at OpenAI’s website

Then, in the librechat.yaml file, add the following configuration:

librechat.yaml
tts:
  openai:
    apiKey: '${TTS_API_KEY}'
    model: 'tts-1'
    voices: ['alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer']

you can choose between the tts-1 and the tts-1-hd models, more information about the models can be found in the OpenAI’s documentation

the voice variable can be alloy, echo, fable etc… more information about the voices can be found in the OpenAI’s documentation

ElevenLabs

Create an ElevenLabs api key at ElevenLabs’s website

Then, click on the “Voices” tab, and copy the ID of the voices you want to use. If you haven’t already added one, click on the “Voice library” where you can find a lot of pre-made voices, add one and copy the ID of the voice that you want to use by clicking the “ID” button

in the librechat.yaml file, add the following configuration:

librechat.yaml
tts:
  elevenlabs:
    apiKey: '${TTS_API_KEY}'
    model: 'eleven_multilingual_v2'
    voices: ['202898wioas09d2', 'addwqr324tesfsf', '3asdasr3qrq44w', 'adsadsa']

if you want to add custom parameters, you can add them in the librechat.yaml file as follows:

⚠️
only for ElevenLabs

these parameters under the voice_settings and the pronunciation_dictionary_locators are only for ElevenLabs

librechat.yaml
voice_settings:
  similarity_boost: '' # number
  stability: '' # number
  style: '' # number
  use_speaker_boost: #boolean
pronunciation_dictionary_locators: [''] # list of strings (array)

OpenAI compatible TTS services

check the OpenAI TTS section, just change the url variable to the ones that you want to use. It should be a complete url:

librechat.yaml
tts:
  openai:
    apiKey: 'sk-xxx'
    model: 'tts-1'
    voices: ['alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer']
    url: "https://api.compatible.com/v1/audio/speech"

ElevenLabs compatible TTS services

check the ElevenLabs section, just change the url variable to the ones that you want to use