Skip to content

Context System

Every time you send a message, CodeBuddy assembles a context window — a curated set of code snippets, file contents, and architectural knowledge that grounds the AI’s response in your actual codebase. This page explains how that context is gathered, selected, and managed.

sequenceDiagram participant User participant QC as Question Classifier participant CG as Context Gatherers participant AST as AST Analysis<br/>(Tree-Sitter) participant VDB as Vector DB<br/>(Hybrid Search) participant Arch as Architecture<br/>Context + Memories participant Sel as Smart Context<br/>Selection participant PA as Prompt Assembly participant CW as Context Window<br/>Compaction User->>QC: Message + @-mentioned files QC->>QC: Classify (codebase-related? categories? confidence?) alt Not codebase-related QC-->>PA: Raw message (skip context) else Codebase-related QC->>CG: Trigger parallel gathering par File contents CG->>CG: Load @-mentioned files and AST search CG->>AST: Extract search terms → Tree-Sitter parse AST-->>CG: Matched code snippets and Semantic search CG->>VDB: Hybrid BM25 + vector similarity VDB-->>CG: Ranked code chunks and Architecture CG->>Arch: Load architecture context + memories Arch-->>CG: Project knowledge end CG->>Sel: All gathered context Sel->>Sel: Token budget, relevance scoring, deduplication Sel-->>PA: Selected context (within budget) end PA->>PA: Assemble (persona + mission + rules + code context + format) PA->>CW: Check token usage alt Over 90% context window CW->>CW: 5-tier progressive compression CW-->>PA: Compacted prompt end PA-->>User: Final prompt → LLM

The ContextEnhancementService first classifies your question via a fast LLM call to determine if it’s codebase-related. This produces:

  • isCodebaseRelated — boolean, gates whether any code context is gathered
  • categories — what kind of question (architecture, debugging, feature, etc.)
  • confidence — how confident the classification is

Non-codebase questions (general knowledge, explanations) skip context gathering entirely, saving tokens and latency.

Files you reference with @ in the chat input are always included at highest priority. The active editor file is also auto-prepended if not already mentioned. File contents are loaded via FileService.getFilesContent().

The AnalyzeCodeProvider uses Tree-Sitter to parse your codebase and extract relevant code based on search terms. If you provided @-files, simple keyword extraction is used. Otherwise, an LLM generates search terms from your question.

The ContextRetriever runs a 4-tier search with cascading fallback:

  1. Hybrid search — vector similarity + BM25 keyword search via HybridSearchService
  2. Keyword-only — FTS4 search if embedding generation fails
  3. Legacy vector — brute-force cosine similarity scan
  4. Legacy keyword — basic term counting

For broad/architectural queries, common project files (README, package.json, entry points) are appended automatically. Results are deduplicated by file path and capped at 15 items.

See Semantic Search for details on the hybrid search pipeline.

The prompt builder also injects:

  • Architecture context — detected patterns, framework info from Codebase Analysis
  • Memories — formatted memories from the Memory System, sanitized against prompt injection
  • Team context — relevant team/project knowledge if available

All gathered code goes through SmartContextSelectorService for budget-aware selection.

ModelToken budget
Claude 3 Opus50,000
GPT-4o20,000
Qwen 2.5 Coder4,000
Default (others)4,000

Tokens are estimated at 1 token ≈ 4 characters.

SourceScore
User-selected files (@-mentioned)1.0 (always included first)
Auto-gathered code0.1–0.9 (scored by keyword density)

The selector uses extractSmartSnippets() to pull out meaningful code structures — functions, classes, interfaces, types, variables — using regex patterns for TypeScript, JavaScript, and Python.

  1. Deduplicate by filePath:startLine
  2. Sort by relevance score (highest first)
  3. Greedy fill: add snippets until the token budget is exhausted
  4. Return ContextSelectionResult with snippet list, total tokens used, whether truncation occurred, and how many snippets were dropped

When the conversation grows long, the ContextWindowCompactionService progressively compresses older messages to stay within limits.

Model familyContext window
Gemini 2.0 Flash1,000,000 tokens
Claude200,000 tokens
GPT-4o128,000 tokens
DeepSeek64,000 tokens

The effective window is resolved as: user setting (codebuddy.contextWindow) → model lookup16K default.

When usage exceeds 90% of the context window (warning at 80%):

TierStrategyWhat happens
1NONENo compaction needed
2TOOL_STRIPStrip tool call details from older messages
3MULTI_CHUNKCombine and summarize multiple message chunks
4PARTIALAggressively summarize all but recent messages
5PLAIN_FALLBACKPlain-text compression of everything except the 4 most recent messages

The 4 most recent messages are always preserved in full, regardless of compaction tier.

Inline completions use a separate, lighter context pipeline via ContextCompletionService:

ComponentSize limit
Prefix (code before cursor)2,000 tokens
Suffix (code after cursor)500 tokens
ImportsExtracted via Tree-Sitter

The FIMPromptService formats these into Fill-in-the-Middle tokens appropriate for the model family (DeepSeek, CodeLlama, StarCoder/Codestral, Qwen).

See Inline Completion for details.

SettingDefaultDescription
codebuddy.contextWindow"16k"Max context window size: 4k, 8k, 16k, 32k, 128k
codebuddy.includeHiddenfalseInclude hidden files in context gathering
codebuddy.maxFileSize"1"Max file size in MB for context gathering
codebuddy.hybridSearch.vectorWeight0.7Weight for semantic similarity in search
codebuddy.hybridSearch.textWeight0.3Weight for keyword matching in search
codebuddy.hybridSearch.topK10Max search results returned