LibreChat

OCR Config Object Structure

Overview

The ocr object allows you to configure Optical Character Recognition (OCR) settings for the application, enabling the extraction of text from images. This section provides a detailed breakdown of the ocr object structure.

There are 4 main fields under ocr:

  • mistralModel
  • apiKey
  • baseURL
  • strategy

Notes:

  • If using the Mistral OCR API, you don't need to edit your librechat.yaml file.
    • You only need the following environment variables to get started: OCR_API_KEY and OCR_BASEURL.
  • OCR functionality allows the application to extract text from images, which can then be processed by AI models.
  • The default strategy is mistral_ocr, which uses Mistral's OCR capabilities.
  • You can also configure a custom OCR service by setting the strategy to custom_ocr.
  • Azure-deployed Mistral OCR models can be used by setting the strategy to azure_mistral_ocr.
  • Google Vertex AI-deployed Mistral OCR models can be used by setting the strategy to vertexai_mistral_ocr.
    • Requires the GOOGLE_SERVICE_KEY_FILE environment variable to be set with service account credentials
    • The service key can be provided as: file path, URL, base64 encoded JSON, or raw JSON string
    • Project ID and location are automatically extracted from the service account credentials
  • Local text extraction is available via document_parser, which extracts text from PDF, DOCX, XLS/XLSX, and OpenDocument files without any external API.
    • Uses pdfjs-dist, mammoth, and SheetJS locally — no API key or base URL needed
    • Only the strategy field is required; apiKey, baseURL, and mistralModel are ignored
  • If using the default Mistral OCR, you may optionally specify a specific Mistral model to use.
  • Environment variable parsing is supported for apiKey, baseURL, and mistralModel parameters.
  • A user_provided strategy option is planned for future releases but is not yet implemented.

Automatic Document Parsing (No Configuration Required)

The built-in document_parser runs automatically for agent file uploads even when no ocr block is configured in your librechat.yaml. This means PDF, DOCX, XLS/XLSX, and ODS files are parsed out of the box without any setup.

The resolution logic works as follows:

  1. No ocr config exists — When an agent context file is uploaded and its MIME type matches a supported document type (PDF, DOCX, Excel, ODS), the document_parser is used directly. No OCR capability check is required for the agent.

  2. ocr config exists — The configured strategy (e.g., mistral_ocr) is tried first. If the configured strategy fails at runtime, the document_parser is used as a fallback so text extraction still succeeds for supported document types.

  3. Neither succeeds — If both the configured strategy and the document parser fail (e.g., the file is an image-only PDF with no embedded text), an error is returned suggesting that an OCR service is needed.

The document_parser handles text-based documents only. For image-based PDFs or scanned documents, you still need a configured OCR strategy (such as mistral_ocr) to extract text from the images within those files.

Example

ocr:
  mistralModel: "mistral-ocr-latest"
  apiKey: "your-mistral-api-key"
  strategy: "mistral_ocr"

Example with custom OCR:

ocr:
  apiKey: "your-custom-ocr-api-key"
  baseURL: "https://your-custom-ocr-service.com/api"
  strategy: "custom_ocr"

Example with Azure Mistral OCR:

ocr:
  mistralModel: "deployed-mistral-ocr-2503" # should match deployment name on Azure
  apiKey: "${AZURE_MISTRAL_OCR_API_KEY}" # arbitrary .env var reference
  baseURL: "https://your-deployed-endpoint.models.ai.azure.com/v1" # hardcoded, can also be .env var reference
  strategy: "azure_mistral_ocr"

Example with Google Vertex AI Mistral OCR:

ocr:
  mistralModel: "mistral-ocr-2505" # model name as deployed in Vertex AI
  strategy: "vertexai_mistral_ocr"

Example with local document parser (no external API needed):

ocr:
  strategy: "document_parser"

mistralModel

KeyTypeDescriptionExample
mistralModelStringThe Mistral model to use for OCR processing. For Azure deployments, this should match your deployment name. For Google Vertex AI, this should match the model name in your deployment.Optional. Specifies which Mistral model should be used when the strategy is set to mistral_ocr, azure_mistral_ocr, or vertexai_mistral_ocr.
ocr:
  mistralModel: "mistral-ocr-latest"

For Azure deployments:

ocr:
  mistralModel: "deployed-mistral-ocr-2503" # Your Azure deployment name

For Google Vertex AI deployments:

ocr:
  mistralModel: "mistral-ocr-2505" # Your Vertex AI model name

apiKey

KeyTypeDescriptionExample
apiKeyStringThe API key for the OCR service. Not used for Google Vertex AI (uses service account authentication via GOOGLE_SERVICE_KEY_FILE).Optional. Defaults to the environment variable OCR_API_KEY if not specified.
ocr:
  apiKey: "your-ocr-api-key"

baseURL

KeyTypeDescriptionExample
baseURLStringThe base URL for the OCR service API. For Google Vertex AI, this is automatically constructed from the service account credentials.Optional. Defaults to the environment variable OCR_BASEURL if not specified.
ocr:
  baseURL: "https://your-ocr-service.com/api"

strategy

KeyTypeDescriptionExample
strategyStringThe OCR strategy to use.Determines which OCR service to use. Options are "mistral_ocr", "azure_mistral_ocr", "vertexai_mistral_ocr", "document_parser", or "custom_ocr". Defaults to "mistral_ocr".
ocr:
  strategy: "custom_ocr"

Available Strategies:

  • mistral_ocr: Uses Mistral's OCR capabilities via the standard Mistral API.
  • azure_mistral_ocr: Uses Mistral OCR models deployed on Azure AI Foundry.
  • vertexai_mistral_ocr: Uses Mistral OCR models deployed on Google Cloud Vertex AI.
  • document_parser: Uses local text extraction for PDF, DOCX, XLS/XLSX, and OpenDocument files. No external API needed. Also runs automatically for agent file uploads when no ocr config is present, and as a fallback when a configured OCR strategy fails.
  • custom_ocr: Uses a custom OCR service specified by the baseURL (not yet implemented).

How is this guide?