LibreChat

Summarization Configuration

Overview

The summarization configuration provides centralized control over conversation summarization and context pruning. This replaces the per-endpoint summarize and summaryModel fields that were previously available on custom and Azure OpenAI endpoints.

When a conversation exceeds the model's context window, the summarization system automatically compresses older messages into a concise checkpoint summary. This allows conversations to continue indefinitely without losing important context. The system also includes context pruning, which progressively degrades large tool results in older messages to reclaim token space before summarization is needed.

Example

summarization:
  provider: "openAI"
  model: "gpt-4o-mini"
  maxSummaryTokens: 4096
  reserveRatio: 0.05
  trigger:
    type: "token_ratio"
    value: 0.8
  contextPruning:
    enabled: true
    keepLastAssistants: 3
    softTrimRatio: 0.3
    hardClearRatio: 0.5
    minPrunableToolChars: 50000
    softTrim:
      maxChars: 4000
      headChars: 1500
      tailChars: 1500
    hardClear:
      enabled: true
      placeholder: "[Old tool result content cleared]"

provider

KeyTypeDescriptionExample
providerStringThe LLM provider to use for summarization calls. If omitted, uses the agent's own provider.provider: "openAI"

Default: Agent's own provider

model

KeyTypeDescriptionExample
modelStringThe model to use for summarization calls. If omitted, uses the agent's own model.model: "gpt-4o-mini"

Default: Agent's own model

parameters

KeyTypeDescriptionExample
parametersObjectAdditional LLM parameters for summarization requests (e.g., temperature, top_p).parameters: { temperature: 0.3 }

prompt

KeyTypeDescriptionExample
promptStringCustom prompt for initial summarization. Replaces the built-in checkpoint prompt.

Default: A structured checkpoint prompt that produces sections for Goal, Constraints & Preferences, Progress, Key Decisions, Next Steps, and Critical Context.

updatePrompt

KeyTypeDescriptionExample
updatePromptStringCustom prompt for re-compaction when a prior summary already exists. Used when the summary needs to be updated with new conversation content.

Default: A built-in prompt that merges new messages into the existing checkpoint, compresses older details, and gives recent actions more detail.

maxSummaryTokens

KeyTypeDescriptionExample
maxSummaryTokensNumberMaximum number of output tokens for the summarization model response.maxSummaryTokens: 4096

reserveRatio

KeyTypeDescriptionExample
reserveRatioNumberFraction of the token budget reserved as headroom (0–1). Prevents the context from being filled to absolute capacity.reserveRatio: 0.05

Default: 0.05 (5% headroom)

trigger

KeyTypeDescriptionExample
triggerObjectDefines when summarization is activated. If omitted, summarization fires whenever message pruning drops any messages.

trigger Sub-keys

KeyTypeDescriptionExample
typeStringThe trigger strategy. Options: `"token_ratio"`, `"remaining_tokens"`, `"messages_to_refine"`.type: "token_ratio"
valueNumberThe threshold value for the chosen trigger type.value: 0.8

Trigger Types

TypeValueFires When
token_ratio0.0–1.0The fraction of context tokens used reaches or exceeds the value
remaining_tokensNumberThe remaining context tokens drops to or below the value
messages_to_refineNumberThe count of messages eligible for summarization reaches or exceeds the value
(not set)Whenever pruning drops any messages (default behavior)

Example:

summarization:
  trigger:
    type: "remaining_tokens"
    value: 8000

contextPruning

KeyTypeDescriptionExample
contextPruningObjectConfigures position-based tool result degradation. Large tool results in older messages are progressively trimmed or cleared to reclaim token space.

Context pruning is an opt-in feature that operates independently of summarization. It targets large tool call results in older messages, applying two progressive stages:

  1. Soft trim — Truncates tool results to keep only the head and tail portions, with an ellipsis in between
  2. Hard clear — Replaces the entire tool result with a short placeholder

Both stages are position-based: messages closer to the beginning of the conversation (older) are pruned first.

contextPruning Sub-keys

KeyTypeDescriptionExample
enabledBooleanEnables position-based tool result degradation.enabled: true
keepLastAssistantsNumberNumber of recent assistant turns to protect from any pruning.keepLastAssistants: 3
softTrimRatioNumberAge ratio (0–1) at which soft-trim activates. Messages older than this ratio of the conversation are candidates for soft-trimming.softTrimRatio: 0.3
hardClearRatioNumberAge ratio (0–1) at which hard-clear activates. Messages older than this ratio are candidates for full replacement.hardClearRatio: 0.5
minPrunableToolCharsNumberMinimum character count of a tool result before pruning applies. Smaller results are left untouched.minPrunableToolChars: 50000
softTrimObjectConfiguration for the soft-trim stage.
hardClearObjectConfiguration for the hard-clear stage.

Defaults:

FieldDefault
enabledfalse
keepLastAssistants3
softTrimRatio0.3
hardClearRatio0.5
minPrunableToolChars50000

softTrim Sub-keys

KeyTypeDescriptionExample
maxCharsNumberMaximum total characters after soft-trimming a tool result.maxChars: 4000
headCharsNumberNumber of characters to preserve from the beginning of the tool result.headChars: 1500
tailCharsNumberNumber of characters to preserve from the end of the tool result.tailChars: 1500

Defaults: maxChars: 4000, headChars: 1500, tailChars: 1500

hardClear Sub-keys

KeyTypeDescriptionExample
enabledBooleanWhether the hard-clear stage is active. When disabled, only soft-trim is applied.enabled: true
placeholderStringPlaceholder text that replaces the full tool result content when hard-cleared.placeholder: "[Old tool result content cleared]"

Defaults: enabled: true, placeholder: "[Old tool result content cleared]"

Example:

summarization:
  contextPruning:
    enabled: true
    keepLastAssistants: 5
    softTrimRatio: 0.25
    hardClearRatio: 0.6
    minPrunableToolChars: 30000
    softTrim:
      maxChars: 6000
      headChars: 2500
      tailChars: 2500
    hardClear:
      enabled: true
      placeholder: "[Content removed for context management]"

Complete Configuration Example

version: 1.3.8
cache: true
 
summarization:
  provider: "openAI"
  model: "gpt-4o-mini"
  maxSummaryTokens: 4096
  reserveRatio: 0.05
  trigger:
    type: "token_ratio"
    value: 0.8
  contextPruning:
    enabled: true
    keepLastAssistants: 3
    softTrimRatio: 0.3
    hardClearRatio: 0.5
    minPrunableToolChars: 50000
    softTrim:
      maxChars: 4000
      headChars: 1500
      tailChars: 1500
    hardClear:
      enabled: true
      placeholder: "[Old tool result content cleared]"

Migration from Per-Endpoint Settings

If you previously used summarize and summaryModel on custom or Azure OpenAI endpoints:

endpoints:
  custom:
    - name: "My Endpoint"
      summarize: true
      summaryModel: "gpt-3.5-turbo"

These fields have been removed. Use the top-level summarization configuration instead:

summarization:
  model: "gpt-4o-mini"

Notes

  • Summarization is configured globally rather than per-endpoint
  • The summarize and summaryModel fields on custom endpoints and Azure OpenAI endpoints are no longer supported
  • When provider and model are omitted, the agent's own provider and model are used for summarization
  • Context pruning is disabled by default and must be explicitly enabled with contextPruning.enabled: true
  • Context pruning only affects tool call results that exceed minPrunableToolChars — smaller results are never pruned
  • The keepLastAssistants setting protects recent turns from pruning regardless of the trim/clear ratios
  • Custom prompt and updatePrompt values fully replace the built-in prompts — use with care
  • Set AGENT_DEBUG_LOGGING=true in your .env file to enable verbose logging of token counts and context pruning diagnostics

How is this guide?