Cost Tracking
The CostTrackingService monitors token usage and computes estimated costs for every API call. It maintains pricing tables for 30+ models across 9 providers and produces per-conversation, per-provider, and aggregate summaries.
How it works
Section titled “How it works”Every API call to an LLM provider consumes tokens. CodeBuddy tracks:
- Input tokens — The prompt and context sent to the model
- Output tokens — The model’s response
- Estimated cost — Computed with 6-decimal-place precision using per-model pricing tables
Cost is calculated as:
$$\text{cost} = \frac{\text{input tokens} \times \text{input price}}{1{,}000{,}000} + \frac{\text{output tokens} \times \text{output price}}{1{,}000{,}000}$$
Viewing costs
Section titled “Viewing costs”Cost information is displayed at three levels:
- Per message — Token count shown below each response
- Per conversation — Cumulative cost for the current thread
- Aggregate summary — Total cost broken down by provider and model
Setting cost limits
Section titled “Setting cost limits”Prevent unexpected charges with a per-task cost limit:
{ "codebuddy.costLimit": 1.0}When the limit is reached, CodeBuddy pauses and asks if you want to continue.
Pricing by provider
Section titled “Pricing by provider”Approximate per-million-token pricing (mid-2025):
| Provider | Model | Input | Output |
|---|---|---|---|
| Anthropic | Claude Sonnet 4 | $3.00 | $15.00 |
| Anthropic | Claude Opus 4 | $15.00 | $75.00 |
| Anthropic | Claude Haiku | $0.25 | $1.25 |
| OpenAI | GPT-4o | $2.50 | $10.00 |
| OpenAI | GPT-4o-mini | $0.15 | $0.60 |
| OpenAI | o3-mini | $1.10 | $4.40 |
| Gemini 2.5 Pro | $1.25 | $10.00 | |
| Gemini 2.5 Flash | $0.15 | $0.60 | |
| Groq | Llama 3.3 70B | $0.59 | $0.79 |
| DeepSeek | DeepSeek Chat | $0.27 | $1.10 |
| DeepSeek | DeepSeek Reasoner | $0.55 | $2.19 |
| Qwen | Qwen Plus | $0.80 | $2.00 |
| xAI | Grok | $5.00 | $15.00 |
| Ollama | Local models | Free | Free |
When a model isn’t in the pricing table, a conservative fallback of $3.00 / $15.00 per million tokens is used.
Cost summary API
Section titled “Cost summary API”The service exposes structured cost data:
interface ICostSummary { totals: { inputTokens: number; outputTokens: number; estimatedCost: number; requestCount: number; }; providers: Array<{ provider: string; model: string; inputTokens: number; outputTokens: number; estimatedCost: number; requestCount: number; }>; conversations: Array<{ threadId: string; provider: string; model: string; inputTokens: number; outputTokens: number; estimatedCost: number; requestCount: number; }>;}Tips for reducing costs
Section titled “Tips for reducing costs”- Use smaller models for simple tasks — GPT-4o-mini or Gemini Flash for file renames, formatting, simple edits
- Use local models — Ollama for tasks that don’t need frontier intelligence
- Set project rules — Clear rules reduce unnecessary iterations and context growth
- Set a cost limit — Use
codebuddy.costLimitas a safety net - Monitor context growth — Long conversations accumulate tokens. Start new conversations for unrelated tasks