Skip to content

Inline Code Completion

CodeBuddy provides inline code completions (ghost text) as you type, powered by the same LLM providers available for chat and agent mode. Completions are local-first by default — using Ollama with qwen2.5-coder — so you get fast, private suggestions with zero cloud API costs.

graph TB A["Keystroke"] --> B["Debounce (300ms default)"] B --> C["Context Gathering<br/>(prefix + suffix + imports via Tree-Sitter AST)"] C --> D["FIM Prompt Builder<br/>(model-specific Fill-in-the-Middle tokens)"] D --> E["LLM Provider (Local / Cloud)"] E --> F["Ghost text in editor"]
  1. Debounce — After you stop typing, CodeBuddy waits the configured delay (default 300ms) before requesting a completion. This prevents excessive API calls during rapid typing.

  2. Context gatheringContextCompletionService captures:

    • Prefix: Up to ~8,000 characters before the cursor (~2,000 tokens)
    • Suffix: Up to ~2,000 characters after the cursor (~500 tokens)
    • Imports: Extracted via Tree-Sitter AST parsing (TypeScript, JavaScript, Python) to provide type context
  3. FIM prompt buildingFIMPromptService constructs a Fill-in-the-Middle prompt using model-specific tokens. If the model doesn’t support FIM, it falls back to a standard prefix-only prompt.

  4. Completion — The prompt is sent to the configured provider. Results are cached (LRU, 50 entries) to avoid duplicate requests for the same context.

  5. Display — The completion appears as ghost text in the editor. Press Tab to accept.

FIM-capable models use special tokens to mark the prefix, suffix, and fill position:

Model familyPrefix tokenSuffix tokenMiddle tokenEOT token
Qwen (default)<|fim_prefix|><|fim_suffix|><|fim_middle|><|endoftext|>
DeepSeek<|fim_begin|><|fim_hole|><|fim_end|><|end_of_text|>
CodeLlama<PRE><SUF><MID><EOT>
StarCoder / Codestral<fim_prefix><fim_suffix><fim_middle><|endoftext|>

Models without FIM support receive only the prefix text and generate the next likely tokens.

SettingTypeDefaultDescription
codebuddy.completion.enabledbooleantrueEnable or disable inline completions
codebuddy.completion.providerenum"Local"Provider: Gemini, Groq, Anthropic, Deepseek, OpenAI, Qwen, GLM, Local
codebuddy.completion.modelstring"qwen2.5-coder"Model name
codebuddy.completion.apiKeystring""API key (falls back to the main provider key)
codebuddy.completion.debounceMsnumber300Trigger delay in milliseconds (min: 50)
codebuddy.completion.maxTokensnumber128Maximum tokens per completion
codebuddy.completion.triggerModeenum"automatic"automatic (as you type) or manual (explicit trigger)
codebuddy.completion.multiLinebooleantrueAllow multi-line completions
CommandWhat it does
Toggle Inline CompletionsToggles codebuddy.completion.enabled on or off
Configure Completion SettingsOpens editor settings filtered to codebuddy.completion

The completion status bar item shows the active state:

  • $(zap) CodeBuddy: qwen2.5-coder — completions enabled, showing the active model
  • $(circle-slash) CodeBuddy: Off — completions disabled

Click the status bar item to open completion settings.

All 8 providers work for completions. The factory routes each provider to the appropriate SDK:

ProviderEndpointFIM support
Local (default)http://localhost:11434/v1Yes (Qwen, DeepSeek, CodeLlama, StarCoder)
Groqapi.groq.comDepends on model
OpenAIapi.openai.comNo (chat fallback)
AnthropicVia Anthropic SDKNo (chat fallback)
GeminiVia Google AI SDKNo (chat fallback)
DeepSeekapi.deepseek.comYes
Qwendashscope-intl.aliyuncs.comYes
GLMopen.bigmodel.cnDepends on model

Completions work in all file types — the provider is registered with { pattern: "**" }. Import extraction via Tree-Sitter currently supports TypeScript, JavaScript, and Python. Other languages get prefix/suffix context without import signatures.