OCR Config Object Structure
Overview
The ocr object allows you to configure Optical Character Recognition (OCR) settings for the application, enabling the extraction of text from images. This section provides a detailed breakdown of the ocr object structure.
There are 4 main fields under ocr:
mistralModelapiKeybaseURLstrategy
Notes:
- If using the Mistral OCR API, you don't need to edit your
librechat.yamlfile.- You only need the following environment variables to get started:
OCR_API_KEYandOCR_BASEURL.
- You only need the following environment variables to get started:
- OCR functionality allows the application to extract text from images, which can then be processed by AI models.
- The default strategy is
mistral_ocr, which uses Mistral's OCR capabilities. - You can also configure a custom OCR service by setting the strategy to
custom_ocr. - Azure-deployed Mistral OCR models can be used by setting the strategy to
azure_mistral_ocr. - Google Vertex AI-deployed Mistral OCR models can be used by setting the strategy to
vertexai_mistral_ocr.- Requires the
GOOGLE_SERVICE_KEY_FILEenvironment variable to be set with service account credentials - The service key can be provided as: file path, URL, base64 encoded JSON, or raw JSON string
- Project ID and location are automatically extracted from the service account credentials
- Requires the
- Local text extraction is available via
document_parser, which extracts text from PDF, DOCX, XLS/XLSX, and OpenDocument files without any external API.- Uses
pdfjs-dist,mammoth, andSheetJSlocally — no API key or base URL needed - Only the
strategyfield is required;apiKey,baseURL, andmistralModelare ignored
- Uses
- If using the default Mistral OCR, you may optionally specify a specific Mistral model to use.
- Environment variable parsing is supported for
apiKey,baseURL, andmistralModelparameters. - A
user_providedstrategy option is planned for future releases but is not yet implemented.
Automatic Document Parsing (No Configuration Required)
The built-in document_parser runs automatically for agent file uploads even when no ocr block is configured in your librechat.yaml. This means PDF, DOCX, XLS/XLSX, and ODS files are parsed out of the box without any setup.
The resolution logic works as follows:
-
No
ocrconfig exists — When an agent context file is uploaded and its MIME type matches a supported document type (PDF, DOCX, Excel, ODS), thedocument_parseris used directly. No OCR capability check is required for the agent. -
ocrconfig exists — The configured strategy (e.g.,mistral_ocr) is tried first. If the configured strategy fails at runtime, thedocument_parseris used as a fallback so text extraction still succeeds for supported document types. -
Neither succeeds — If both the configured strategy and the document parser fail (e.g., the file is an image-only PDF with no embedded text), an error is returned suggesting that an OCR service is needed.
The document_parser handles text-based documents only. For image-based PDFs or scanned documents, you still need a configured OCR strategy (such as mistral_ocr) to extract text from the images within those files.
Example
Example with custom OCR:
Example with Azure Mistral OCR:
Example with Google Vertex AI Mistral OCR:
Example with local document parser (no external API needed):
mistralModel
| Key | Type | Description | Example |
|---|---|---|---|
| mistralModel | String | The Mistral model to use for OCR processing. For Azure deployments, this should match your deployment name. For Google Vertex AI, this should match the model name in your deployment. | Optional. Specifies which Mistral model should be used when the strategy is set to mistral_ocr, azure_mistral_ocr, or vertexai_mistral_ocr. |
For Azure deployments:
For Google Vertex AI deployments:
apiKey
| Key | Type | Description | Example |
|---|---|---|---|
| apiKey | String | The API key for the OCR service. Not used for Google Vertex AI (uses service account authentication via GOOGLE_SERVICE_KEY_FILE). | Optional. Defaults to the environment variable OCR_API_KEY if not specified. |
baseURL
| Key | Type | Description | Example |
|---|---|---|---|
| baseURL | String | The base URL for the OCR service API. For Google Vertex AI, this is automatically constructed from the service account credentials. | Optional. Defaults to the environment variable OCR_BASEURL if not specified. |
strategy
| Key | Type | Description | Example |
|---|---|---|---|
| strategy | String | The OCR strategy to use. | Determines which OCR service to use. Options are "mistral_ocr", "azure_mistral_ocr", "vertexai_mistral_ocr", "document_parser", or "custom_ocr". Defaults to "mistral_ocr". |
Available Strategies:
mistral_ocr: Uses Mistral's OCR capabilities via the standard Mistral API.azure_mistral_ocr: Uses Mistral OCR models deployed on Azure AI Foundry.vertexai_mistral_ocr: Uses Mistral OCR models deployed on Google Cloud Vertex AI.document_parser: Uses local text extraction for PDF, DOCX, XLS/XLSX, and OpenDocument files. No external API needed. Also runs automatically for agent file uploads when noocrconfig is present, and as a fallback when a configured OCR strategy fails.custom_ocr: Uses a custom OCR service specified by thebaseURL(not yet implemented).
How is this guide?