LibreChat

Ollama

Example configuration for Ollama

Ollama API key: Required but ignored - Ollama OpenAI Compatibility

Notes:

  • Known: icon provided.
  • Download models with ollama run command. See Ollama Library
  • It's recommend to use the value "current_model" for the titleModel to avoid loading more than 1 model per conversation.
    • Doing so will dynamically use the current conversation model for the title generation.
  • The example includes a top 5 popular model list from the Ollama Library, which was last updated on March 1, 2024, for your convenience.
  custom:
    - name: "Ollama"
      apiKey: "ollama"
      # use 'host.docker.internal' instead of localhost if running LibreChat in a docker container
      baseURL: "http://localhost:11434/v1/" 
      models:
        default: [
          "llama2",
          "mistral",
          "codellama",
          "dolphin-mixtral",
          "mistral-openorca"
          ]
        # fetching list of models is supported but the `name` field must start
        # with `ollama` (case-insensitive), as it does in this example.
        fetch: true
      titleConvo: true
      titleModel: "current_model"
      summarize: false
      summaryModel: "current_model"
      modelDisplayLabel: "Ollama"

Ollama -> llama3

Note: Once stop was removed from the default parameters, the issue highlighted below should no longer exist.

However, in case you experience the behavior where llama3 does not stop generating, add this addParams block to the config:

  custom:
    - name: "Ollama"
      apiKey: "ollama"
      baseURL: "http://host.docker.internal:11434/v1/"
      models:
        default: [
          "llama3"
        ]
        fetch: false # fetching list of models is not supported
      titleConvo: true
      titleModel: "current_model"
      summarize: false
      summaryModel: "current_model"
      modelDisplayLabel: "Ollama"
      addParams:
          "stop": [
              "<|start_header_id|>",
              "<|end_header_id|>",
              "<|eot_id|>",
              "<|reserved_special_token"
          ]

If you are only using llama3 with Ollama, it's fine to set the stop parameter at the config level via addParams.

However, if you are using multiple models, it's now recommended to add stop sequences from the frontend via conversation parameters and presets.

For example, we can omit addParams:

- name: "Ollama"
    apiKey: "ollama"
    baseURL: "http://host.docker.internal:11434/v1/" 
    models:
    default: [
        "llama3:latest",
        "mistral"
        ]
    fetch: false # fetching list of models is not supported
    titleConvo: true
    titleModel: "current_model"
    modelDisplayLabel: "Ollama"

And use these settings (best to also save it):

image

How is this guide?