OCR 설정 객체 구조

개요

ocr 객체를 사용하면 애플리케이션의 광학 문자 인식(OCR) 설정을 구성하여 이미지에서 텍스트를 추출할 수 있습니다. 이 섹션에서는 ocr 객체 구조에 대한 자세한 설명을 제공합니다.

ocr 아래에는 4개의 주요 필드가 있습니다:

mistralModel
apiKey
baseURL
strategy

참고:

Mistral OCR API를 사용하는 경우, librechat.yaml 파일을 수정할 필요가 없습니다.
- 시작하려면 다음 환경 변수만 있으면 됩니다: OCR_API_KEY 및 OCR_BASEURL.
OCR 기능은 애플리케이션이 이미지에서 텍스트를 추출할 수 있게 하며, 추출된 텍스트는 AI 모델에 의해 처리될 수 있습니다.
기본 전략은 mistral_ocr이며, 이는 Mistral의 OCR 기능을 사용합니다.
You can also configure a custom OCR service by setting the strategy to custom_ocr.
Azure에 배포된 Mistral OCR 모델은 전략(strategy)을 azure_mistral_ocr로 설정하여 사용할 수 있습니다.
Google Vertex AI에 배포된 Mistral OCR 모델은 전략(strategy)을 vertexai_mistral_ocr로 설정하여 사용할 수 있습니다.
- GOOGLE_SERVICE_KEY_FILE 환경 변수를 서비스 계정 자격 증명으로 설정해야 합니다.
- 서비스 키는 파일 경로, URL, base64 인코딩된 JSON 또는 원시 JSON 문자열로 제공할 수 있습니다.
- Project ID와 위치는 서비스 계정 자격 증명에서 자동으로 추출됩니다.
document_parser를 통해 로컬 텍스트 추출을 사용할 수 있으며, 이는 외부 API 없이 PDF, DOCX, XLS/XLSX 및 OpenDocument 파일에서 텍스트를 추출합니다.
- pdfjs-dist, mammoth, 및 SheetJS를 로컬에서 사용하며, API 키나 기본 URL이 필요하지 않습니다.
- strategy 필드만 필수이며, apiKey, baseURL, mistralModel은 무시됩니다.
기본 Mistral OCR을 사용하는 경우, 선택적으로 사용할 특정 Mistral 모델을 지정할 수 있습니다.
apiKey, baseURL, 및 mistralModel 매개변수에 대해 환경 변수 파싱이 지원됩니다.
user_provided 전략 옵션은 향후 릴리스에서 계획되어 있으나 아직 구현되지 않았습니다.

자동 문서 파싱 (설정 불필요)

내장된 document_parser는 librechat.yaml에 ocr 블록이 구성되어 있지 않은 경우에도 에이전트 파일 업로드를 위해 자동으로 실행됩니다. 즉, PDF, DOCX, XLS/XLSX 및 ODS 파일은 별도의 설정 없이 즉시 파싱됩니다.

해결 로직은 다음과 같이 작동합니다:

ocr 설정이 존재하지 않음 — 에이전트 컨텍스트 파일이 업로드되고 해당 MIME 유형이 지원되는 문서 유형(PDF, DOCX, Excel, ODS)과 일치하는 경우, document_parser가 직접 사용됩니다. 에이전트에 대해 별도의 OCR 기능 확인은 필요하지 않습니다.
ocr 설정이 존재하는 경우 — 구성된 전략(예: mistral_ocr)이 먼저 시도됩니다. 구성된 전략이 런타임에 실패하면, document_parser가 대체 수단으로 사용되어 지원되는 문서 유형에 대한 텍스트 추출이 계속 성공적으로 수행됩니다.
둘 다 실패하는 경우 — 구성된 전략과 문서 파서가 모두 실패하면(예: 내장된 텍스트가 없는 이미지 전용 PDF 파일인 경우), OCR 서비스가 필요하다는 제안과 함께 오류가 반환됩니다.

document_parser는 텍스트 기반 문서만 처리합니다. 이미지 기반 PDF나 스캔된 문서의 경우, 해당 파일 내의 이미지에서 텍스트를 추출하려면 여전히 구성된 OCR 전략(예: mistral_ocr)이 필요합니다.

예시

ocr:
  mistralModel: "mistral-ocr-latest"
  apiKey: "your-mistral-api-key"
  strategy: "mistral_ocr"

사용자 지정 OCR 예시:

ocr:
  apiKey: "your-custom-ocr-api-key"
  baseURL: "https://your-custom-ocr-service.com/api"
  strategy: "custom_ocr"

Azure Mistral OCR을 사용한 예시:

ocr:
  mistralModel: "deployed-mistral-ocr-2503" # should match deployment name on Azure
  apiKey: "${AZURE_MISTRAL_OCR_API_KEY}" # arbitrary .env var reference
  baseURL: "https://your-deployed-endpoint.models.ai.azure.com/v1" # hardcoded, can also be .env var reference
  strategy: "azure_mistral_ocr"

Google Vertex AI Mistral OCR을 사용한 예시:

ocr:
  mistralModel: "mistral-ocr-2505" # model name as deployed in Vertex AI
  strategy: "vertexai_mistral_ocr"

로컬 문서 파서 사용 예시 (외부 API 불필요):

ocr:
  strategy: "document_parser"

mistralModel

Key	Type	Description	Example
mistralModel	String	OCR 처리에 사용할 Mistral 모델입니다. Azure 배포의 경우, 배포 이름과 일치해야 합니다. Google Vertex AI의 경우, 배포된 모델 이름과 일치해야 합니다.	Optional. Specifies which Mistral model should be used when the strategy is set to mistral_ocr, azure_mistral_ocr, or vertexai_mistral_ocr.

ocr:
  mistralModel: "mistral-ocr-latest"

Azure 배포의 경우:

ocr:
  mistralModel: "deployed-mistral-ocr-2503" # Your Azure deployment name

Google Vertex AI 배포의 경우:

ocr:
  mistralModel: "mistral-ocr-2505" # Your Vertex AI model name

apiKey

Key	Type	Description	Example
apiKey	String	OCR 서비스를 위한 API 키입니다. Google Vertex AI에는 사용되지 않습니다(GOOGLE_SERVICE_KEY_FILE을 통한 서비스 계정 인증을 사용함).	Optional. Defaults to the environment variable OCR_API_KEY if not specified.

ocr:
  apiKey: "your-ocr-api-key"

baseURL

Key	Type	Description	Example
baseURL	String	OCR 서비스 API의 기본 URL입니다. Google Vertex AI의 경우, 서비스 계정 자격 증명에서 자동으로 생성됩니다.	Optional. Defaults to the environment variable OCR_BASEURL if not specified.

ocr:
  baseURL: "https://your-ocr-service.com/api"

strategy

Key	Type	Description	Example
strategy	String	사용할 OCR 전략입니다.	Determines which OCR service to use. Options are "mistral_ocr", "azure_mistral_ocr", "vertexai_mistral_ocr", "document_parser", or "custom_ocr". Defaults to "mistral_ocr".

ocr:
  strategy: "custom_ocr"

사용 가능한 전략:

mistral_ocr: 표준 Mistral API를 통해 Mistral의 OCR 기능을 사용합니다.
azure_mistral_ocr: Azure AI Foundry에 배포된 Mistral OCR 모델을 사용합니다.
vertexai_mistral_ocr: Google Cloud Vertex AI에 배포된 Mistral OCR 모델을 사용합니다.
document_parser: PDF, DOCX, XLS/XLSX 및 OpenDocument 파일에 대해 로컬 텍스트 추출을 사용합니다. 외부 API가 필요하지 않습니다. 또한 ocr 설정이 없을 때 에이전트 파일 업로드를 위해 자동으로 실행되며, 구성된 OCR 전략이 실패할 경우 대체 수단(fallback)으로 작동합니다.
custom_ocr: baseURL에 지정된 사용자 지정 OCR 서비스를 사용합니다 (아직 구현되지 않음).