OCR 配置对象结构

概述

ocr 对象允许您为应用程序配置光学字符识别 (OCR) 设置，从而实现从图像中提取文本的功能。本节详细介绍了 ocr 对象的结构。

ocr 下有 4 个主要字段：

mistralModel
apiKey
baseURL
strategy

注意：

如果使用 Mistral OCR API，则无需编辑 librechat.yaml 文件。
- 您只需要以下环境变量即可开始：OCR_API_KEY 和 OCR_BASEURL。
OCR 功能允许应用程序从图像中提取文本，随后可由 AI 模型进行处理。
默认策略是 mistral_ocr，它使用 Mistral 的 OCR 功能。
您也可以通过将策略设置为 custom_ocr 来配置自定义 OCR 服务。
通过将策略设置为 azure_mistral_ocr，可以使用部署在 Azure 上的 Mistral OCR 模型。
通过将策略设置为 vertexai_mistral_ocr，可以使用 Google Vertex AI 部署的 Mistral OCR 模型。
- 需要设置 GOOGLE_SERVICE_KEY_FILE 环境变量，并提供服务账号凭据。
- 服务密钥可以通过以下方式提供：文件路径、URL、base64 编码的 JSON 或原始 JSON 字符串
- 项目 ID 和位置将从服务账号凭据中自动提取
通过 document_parser 可以实现本地文本提取，它无需任何外部 API 即可从 PDF、DOCX、XLS/XLSX 和 OpenDocument 文件中提取文本。
- 在本地使用 pdfjs-dist、mammoth 和 SheetJS —— 无需 API key 或 base URL
- 只有 strategy 字段是必需的；apiKey、baseURL 和 mistralModel 将被忽略。
如果使用默认的 Mistral OCR，您可以选择指定要使用的特定 Mistral 模型。
支持对 apiKey、baseURL 和 mistralModel 参数进行环境变量解析。
计划在未来版本中提供一个 user_provided 策略选项，但目前尚未实现。

自动文档解析（无需配置）

内置的 document_parser 会针对智能体文件上传自动运行，即使您的 librechat.yaml 中未配置 ocr 块也是如此。这意味着 PDF、DOCX、XLS/XLSX 和 ODS 文件无需任何设置即可直接解析。

解析逻辑的工作方式如下：

不存在 ocr 配置 — 当上传代理上下文文件且其 MIME 类型匹配支持的文档类型（PDF、DOCX、Excel、ODS）时，将直接使用 document_parser。代理无需进行 OCR 功能检查。
ocr 配置存在 — 首先尝试配置的策略（例如 mistral_ocr）。如果配置的策略在运行时失败，则使用 document_parser 作为回退方案，以便支持的文档类型仍能成功提取文本。
两者均失败 — 如果配置的策略和文档解析器都失败（例如，文件是仅包含图像且没有嵌入文本的 PDF），则会返回一个错误，建议需要使用 OCR 服务。

document_parser 仅处理基于文本的文档。对于基于图像的 PDF 或扫描文档，您仍然需要配置 OCR 策略（例如 mistral_ocr）来从这些文件中的图像提取文本。

示例

ocr:
  mistralModel: "mistral-ocr-latest"
  apiKey: "your-mistral-api-key"
  strategy: "mistral_ocr"

带有自定义 OCR 的示例：

ocr:
  apiKey: "your-custom-ocr-api-key"
  baseURL: "https://your-custom-ocr-service.com/api"
  strategy: "custom_ocr"

使用 Azure Mistral OCR 的示例：

ocr:
  mistralModel: "deployed-mistral-ocr-2503" # should match deployment name on Azure
  apiKey: "${AZURE_MISTRAL_OCR_API_KEY}" # arbitrary .env var reference
  baseURL: "https://your-deployed-endpoint.models.ai.azure.com/v1" # hardcoded, can also be .env var reference
  strategy: "azure_mistral_ocr"

使用 Google Vertex AI Mistral OCR 的示例：

ocr:
  mistralModel: "mistral-ocr-2505" # model name as deployed in Vertex AI
  strategy: "vertexai_mistral_ocr"

使用本地文档解析器的示例（无需外部 API）：

ocr:
  strategy: "document_parser"

mistralModel

Key	Type	Description	Example
mistralModel	String	用于 OCR 处理的 Mistral 模型。对于 Azure 部署，此项应与您的部署名称一致。对于 Google Vertex AI，此项应与您部署中的模型名称一致。	Optional. Specifies which Mistral model should be used when the strategy is set to mistral_ocr, azure_mistral_ocr, or vertexai_mistral_ocr.

ocr:
  mistralModel: "mistral-ocr-latest"

对于 Azure 部署：

ocr:
  mistralModel: "deployed-mistral-ocr-2503" # Your Azure deployment name

对于 Google Vertex AI 部署：

ocr:
  mistralModel: "mistral-ocr-2505" # Your Vertex AI model name

apiKey

Key	Type	Description	Example
apiKey	String	OCR 服务的 API 密钥。不适用于 Google Vertex AI（使用通过 GOOGLE_SERVICE_KEY_FILE 进行的服务账号身份验证）。	Optional. Defaults to the environment variable OCR_API_KEY if not specified.

ocr:
  apiKey: "your-ocr-api-key"

baseURL

Key	Type	Description	Example
baseURL	String	OCR 服务 API 的基础 URL。对于 Google Vertex AI，此项将根据服务账号凭据自动构建。	Optional. Defaults to the environment variable OCR_BASEURL if not specified.

ocr:
  baseURL: "https://your-ocr-service.com/api"

strategy

Key	Type	Description	Example
strategy	String	要使用的 OCR 策略。	Determines which OCR service to use. Options are "mistral_ocr", "azure_mistral_ocr", "vertexai_mistral_ocr", "document_parser", or "custom_ocr". Defaults to "mistral_ocr".

ocr:
  strategy: "custom_ocr"

可用策略：

mistral_ocr：通过标准的 Mistral API 使用 Mistral 的 OCR 功能。
azure_mistral_ocr：使用部署在 Azure AI Foundry 上的 Mistral OCR 模型。
vertexai_mistral_ocr：使用部署在 Google Cloud Vertex AI 上的 Mistral OCR 模型。
document_parser：使用本地文本提取功能处理 PDF、DOCX、XLS/XLSX 和 OpenDocument 文件。无需外部 API。当未配置 ocr 时，它也会自动为智能体（agent）文件上传运行，并作为已配置 OCR 策略失败时的回退方案。
custom_ocr：使用由 baseURL 指定的自定义 OCR 服务（尚未实现）。