Skip to content

Context Window Compaction

When conversations grow long, they can exceed the LLM’s context window. The Context Window Compaction Service automatically summarizes older messages to reclaim space while preserving conversational context.

graph TB A["Before each LLM call<br/>Estimate total tokens"] --> B{"Token usage<br/>vs context window?"} B -->|"< 80%"| C["No action needed<br/>warningLevel: none"] B -->|"80–90%"| D["Warning issued<br/>warningLevel: warning"] B -->|"> 90%"| E["Auto-compact triggered<br/>warningLevel: critical"] E --> F["Run compaction pipeline"]

The service uses a 4-tier fallback strategy. Each tier is tried in order; the first tier that fits the context window wins.

TierNameStrategy
1Tool stripRemove large tool result content (>200 chars) from older messages. Cheapest, no LLM call.
2Multi-chunkSplit older messages into chunks, summarize each chunk with the LLM, replace originals.
3PartialSummarize only the oldest half of the conversation.
4Plain fallbackNo LLM available — generate a plain-text description of the conversation shape.

The 4 most recent messages are never summarized. This ensures the LLM always has the latest user request and its own most recent response, maintaining coherent conversation flow.

The system prompt is always preserved and its tokens are accounted for in the budget.

The compaction service calculates chunk sizes dynamically:

$$\text{chunkTokens} = \min(\text{contextWindow} \times 0.4, ; 12000)$$

The ratio decreases to a minimum of 0.15 for very large context windows. Each chunk includes structural wrapping (User:, Assistant:, Tool: labels) and individual messages are capped at 10,000 characters before being sent to the summarizer.

The service includes built-in context window sizes for popular models:

Model familyContext window
Claude 3.5/4200,000
GPT-4o / mini128,000
o1200,000
Gemini 2.01,048,576
Gemini 1.5 Pro2,097,152
Llama 3.3 70B128,000
DeepSeek Chat64,000
Qwen Plus131,072
GLM-4 Plus128,000

Override with the codebuddy.contextWindow setting (e.g., "128k").

After compaction, the service returns:

FieldDescription
compactedWhether compaction was performed
originalCountMessages before compaction
finalCountMessages after compaction
originalTokensEstimated tokens before
finalTokensEstimated tokens after
tierWhich compaction tier was used (0–4)
warningLevelnone, warning, or critical
SettingTypeDefaultDescription
codebuddy.contextWindowenum"16k"Context window size: 4k, 8k, 16k, 32k, 128k