Summarization Configuration

Overview

The summarization configuration provides centralized control over conversation summarization and context pruning. This replaces the per-endpoint summarize and summaryModel fields that were previously available on custom and Azure OpenAI endpoints.

When a conversation exceeds the model's context window, the summarization system automatically compresses older messages into a concise checkpoint summary. This allows conversations to continue indefinitely without losing important context. The system also includes context pruning, which progressively degrades large tool results in older messages to reclaim token space before summarization is needed.

Example

summarization:
  provider: "openAI"
  model: "gpt-4o-mini"
  maxSummaryTokens: 4096
  reserveRatio: 0.05
  trigger:
    type: "token_ratio"
    value: 0.8
  contextPruning:
    enabled: true
    keepLastAssistants: 3
    softTrimRatio: 0.3
    hardClearRatio: 0.5
    minPrunableToolChars: 50000
    softTrim:
      maxChars: 4000
      headChars: 1500
      tailChars: 1500
    hardClear:
      enabled: true
      placeholder: "[Old tool result content cleared]"

provider

Key	Type	Description	Example
provider	String	The LLM provider to use for summarization calls. If omitted, uses the agent's own provider.	provider: "openAI"

Default: Agent's own provider

model

Key	Type	Description	Example
model	String	The model to use for summarization calls. If omitted, uses the agent's own model.	model: "gpt-4o-mini"

Default: Agent's own model

parameters

Key	Type	Description	Example
parameters	Object	Additional LLM parameters for summarization requests (e.g., temperature, top_p).	parameters: { temperature: 0.3 }

prompt

Key	Type	Description	Example
prompt	String	Custom prompt for initial summarization. Replaces the built-in checkpoint prompt.

Default: A structured checkpoint prompt that produces sections for Goal, Constraints & Preferences, Progress, Key Decisions, Next Steps, and Critical Context.

updatePrompt

Key	Type	Description	Example
updatePrompt	String	Custom prompt for re-compaction when a prior summary already exists. Used when the summary needs to be updated with new conversation content.

Default: A built-in prompt that merges new messages into the existing checkpoint, compresses older details, and gives recent actions more detail.

maxSummaryTokens

Key	Type	Description	Example
maxSummaryTokens	Number	Maximum number of output tokens for the summarization model response.	maxSummaryTokens: 4096

reserveRatio

Key	Type	Description	Example
reserveRatio	Number	Fraction of the token budget reserved as headroom (0–1). Prevents the context from being filled to absolute capacity.	reserveRatio: 0.05

Default: 0.05 (5% headroom)

trigger

Key	Type	Description	Example
trigger	Object	Defines when summarization is activated. If omitted, summarization fires whenever message pruning drops any messages.

trigger Sub-keys

Key	Type	Description	Example
type	String	The trigger strategy. Options: `"token_ratio"`, `"remaining_tokens"`, `"messages_to_refine"`.	type: "token_ratio"
value	Number	The threshold value for the chosen trigger type.	value: 0.8

Trigger Types

Type	Value	Fires When
`token_ratio`	`0.0–1.0`	The fraction of context tokens used reaches or exceeds the value
`remaining_tokens`	Number	The remaining context tokens drops to or below the value
`messages_to_refine`	Number	The count of messages eligible for summarization reaches or exceeds the value
(not set)	—	Whenever pruning drops any messages (default behavior)

Example:

summarization:
  trigger:
    type: "remaining_tokens"
    value: 8000

contextPruning

Key	Type	Description	Example
contextPruning	Object	Configures position-based tool result degradation. Large tool results in older messages are progressively trimmed or cleared to reclaim token space.

Context pruning is an opt-in feature that operates independently of summarization. It targets large tool call results in older messages, applying two progressive stages:

Soft trim — Truncates tool results to keep only the head and tail portions, with an ellipsis in between
Hard clear — Replaces the entire tool result with a short placeholder

Both stages are position-based: messages closer to the beginning of the conversation (older) are pruned first.

contextPruning Sub-keys

Key	Type	Description	Example
enabled	Boolean	Enables position-based tool result degradation.	enabled: true
keepLastAssistants	Number	Number of recent assistant turns to protect from any pruning.	keepLastAssistants: 3
softTrimRatio	Number	Age ratio (0–1) at which soft-trim activates. Messages older than this ratio of the conversation are candidates for soft-trimming.	softTrimRatio: 0.3
hardClearRatio	Number	Age ratio (0–1) at which hard-clear activates. Messages older than this ratio are candidates for full replacement.	hardClearRatio: 0.5
minPrunableToolChars	Number	Minimum character count of a tool result before pruning applies. Smaller results are left untouched.	minPrunableToolChars: 50000
softTrim	Object	Configuration for the soft-trim stage.
hardClear	Object	Configuration for the hard-clear stage.

Defaults:

Field	Default
`enabled`	`false`
`keepLastAssistants`	`3`
`softTrimRatio`	`0.3`
`hardClearRatio`	`0.5`
`minPrunableToolChars`	`50000`

softTrim Sub-keys

Key	Type	Description	Example
maxChars	Number	Maximum total characters after soft-trimming a tool result.	maxChars: 4000
headChars	Number	Number of characters to preserve from the beginning of the tool result.	headChars: 1500
tailChars	Number	Number of characters to preserve from the end of the tool result.	tailChars: 1500

Defaults: maxChars: 4000, headChars: 1500, tailChars: 1500

hardClear Sub-keys

Key	Type	Description	Example
enabled	Boolean	Whether the hard-clear stage is active. When disabled, only soft-trim is applied.	enabled: true
placeholder	String	Placeholder text that replaces the full tool result content when hard-cleared.	placeholder: "[Old tool result content cleared]"

Defaults: enabled: true, placeholder: "[Old tool result content cleared]"

Example:

summarization:
  contextPruning:
    enabled: true
    keepLastAssistants: 5
    softTrimRatio: 0.25
    hardClearRatio: 0.6
    minPrunableToolChars: 30000
    softTrim:
      maxChars: 6000
      headChars: 2500
      tailChars: 2500
    hardClear:
      enabled: true
      placeholder: "[Content removed for context management]"

Complete Configuration Example

version: 1.3.8
cache: true
 
summarization:
  provider: "openAI"
  model: "gpt-4o-mini"
  maxSummaryTokens: 4096
  reserveRatio: 0.05
  trigger:
    type: "token_ratio"
    value: 0.8
  contextPruning:
    enabled: true
    keepLastAssistants: 3
    softTrimRatio: 0.3
    hardClearRatio: 0.5
    minPrunableToolChars: 50000
    softTrim:
      maxChars: 4000
      headChars: 1500
      tailChars: 1500
    hardClear:
      enabled: true
      placeholder: "[Old tool result content cleared]"

Migration from Per-Endpoint Settings

If you previously used summarize and summaryModel on custom or Azure OpenAI endpoints:

endpoints:
  custom:
    - name: "My Endpoint"
      summarize: true
      summaryModel: "gpt-3.5-turbo"

These fields have been removed. Use the top-level summarization configuration instead:

summarization:
  model: "gpt-4o-mini"

Notes

Summarization is configured globally rather than per-endpoint
The summarize and summaryModel fields on custom endpoints and Azure OpenAI endpoints are no longer supported
When provider and model are omitted, the agent's own provider and model are used for summarization
Context pruning is disabled by default and must be explicitly enabled with contextPruning.enabled: true
Context pruning only affects tool call results that exceed minPrunableToolChars — smaller results are never pruned
The keepLastAssistants setting protects recent turns from pruning regardless of the trim/clear ratios
Custom prompt and updatePrompt values fully replace the built-in prompts — use with care
Set AGENT_DEBUG_LOGGING=true in your .env file to enable verbose logging of token counts and context pruning diagnostics