February 19, 2024


Learn how to run AI models locally using Ollama


Unlock the Power of Ollama: Run Large Language Models on Your Local Hardware

Are you tired of relying on cloud-based solutions to run your language models? Do you want to tap into the potential of large language models without breaking the bank? Look no further than Ollama, a revolutionary platform that lets you run large language models on your local hardware.

What Can Ollama Do?

With Ollama, you can:

  • Run large language models on your local hardware, minus the hefty cloud computing costs
  • Host multiple models with ease
  • Dynamically load models upon request, streamlining your workflow

Getting Started with Ollama

Ready to unlock the power of Ollama? Follow these simple steps to get started:

Install Ollama

You have two options to install Ollama: via the Ollama app or using Docker.

For Mac, Linux, and Windows users, follow the instructions on the Ollama Download page to get started. Ollama supports GPU acceleration on Nvidia, AMD, and Apple Metal, so you can harness the power of your local hardware.

Load Models in Ollama

Now that you have Ollama installed, it’s time to load your models. Here’s how:

  1. Browse the Ollama Library to explore available models.
  2. Copy the text from the Tags tab on the library website and paste it into your terminal. The command should begin with ollama run .
  3. Check the model size to ensure it can run in GPU memory for optimal performance.
  4. Use /bye to exit the terminal when you’re done.

Configure LibreChat

Finally, use your librechat.yaml configuration file to add Ollama as a separate endpoint. Follow our Custom Endpoints & Configuration Guide for a step-by-step walkthrough.

With Ollama, you can unlock the full potential of large language models on your local hardware. Say goodbye to cloud computing costs and hello to faster, more efficient workflows.