Ir al contenido

CodeBuddy provides inline code completions (ghost text) as you type, powered by the same LLM providers available for chat and agent mode. Completions are local-first by default — using Ollama with qwen2.5-coder — so you get fast, private suggestions with zero cloud API costs.

graph TB A["Keystroke"] --> B["Debounce (300ms default)"] B --> C["Context Gathering<br/>(prefix + suffix + imports via Tree-Sitter AST)"] C --> D["FIM Prompt Builder<br/>(model-specific Fill-in-the-Middle tokens)"] D --> E["LLM Provider (Local / Cloud)"] E --> F["Ghost text in editor"]
  1. Debounce — After you stop typing, CodeBuddy waits the configured delay (default 300ms) before requesting a completion. This prevents excessive API calls during rapid typing.

  2. Context gatheringContextCompletionService captures:

    • Prefix: Up to ~8,000 characters before the cursor (~2,000 tokens)
    • Suffix: Up to ~2,000 characters after the cursor (~500 tokens)
    • Imports: Extracted via Tree-Sitter AST parsing (TypeScript, JavaScript, Python) to provide type context
  3. FIM prompt buildingFIMPromptService constructs a Fill-in-the-Middle prompt using model-specific tokens. If the model doesn’t support FIM, it falls back to a standard prefix-only prompt.

  4. Completion — The prompt is sent to the configured provider. Results are cached (LRU, 50 entries) to avoid duplicate requests for the same context.

  5. Display — The completion appears as ghost text in the editor. Press Tab to accept.

FIM-capable models use special tokens to mark the prefix, suffix, and fill position:

Model familyPrefix tokenSuffix tokenMiddle tokenEOT token
Qwen (default)<|fim_prefix|><|fim_suffix|><|fim_middle|><|endoftext|>
DeepSeek<|fim_begin|><|fim_hole|><|fim_end|><|end_of_text|>
CodeLlama<PRE><SUF><MID><EOT>
StarCoder / Codestral<fim_prefix><fim_suffix><fim_middle><|endoftext|>

Models without FIM support receive only the prefix text and generate the next likely tokens.

SettingTypeDefaultDescription
codebuddy.completion.enabledbooleantrueEnable or disable inline completions
codebuddy.completion.providerenum"Local"Provider: Gemini, Groq, Anthropic, Deepseek, OpenAI, Qwen, GLM, Local
codebuddy.completion.modelstring"qwen2.5-coder"Model name
codebuddy.completion.apiKeystring""API key (falls back to the main provider key)
codebuddy.completion.debounceMsnumber300Trigger delay in milliseconds (min: 50)
codebuddy.completion.maxTokensnumber128Maximum tokens per completion
codebuddy.completion.triggerModeenum"automatic"automatic (as you type) or manual (explicit trigger)
codebuddy.completion.multiLinebooleantrueAllow multi-line completions
CommandWhat it does
Toggle Inline CompletionsToggles codebuddy.completion.enabled on or off
Configure Completion SettingsOpens editor settings filtered to codebuddy.completion

The completion status bar item shows the active state:

  • $(zap) CodeBuddy: qwen2.5-coder — completions enabled, showing the active model
  • $(circle-slash) CodeBuddy: Off — completions disabled

Click the status bar item to open completion settings.

All 8 providers work for completions. The factory routes each provider to the appropriate SDK:

ProviderEndpointFIM support
Local (default)http://localhost:11434/v1Yes (Qwen, DeepSeek, CodeLlama, StarCoder)
Groqapi.groq.comDepends on model
OpenAIapi.openai.comNo (chat fallback)
AnthropicVia Anthropic SDKNo (chat fallback)
GeminiVia Google AI SDKNo (chat fallback)
DeepSeekapi.deepseek.comYes
Qwendashscope-intl.aliyuncs.comYes
GLMopen.bigmodel.cnDepends on model

Completions work in all file types — the provider is registered with { pattern: "**" }. Import extraction via Tree-Sitter currently supports TypeScript, JavaScript, and Python. Other languages get prefix/suffix context without import signatures.