Cost Management
Optimize costs through multi-scenario model selection and asynchronous compression strategies, achieving faster response times and lower usage costs while maintaining effectiveness.
CodeBuddy Code consumes tokens with each interaction. Costs vary based on codebase size, query complexity, and conversation length. This document covers how to track costs, multi-scenario model mechanisms, and how to reduce token consumption.
Tracking Costs
Using /cost Command
The /cost command provides detailed token usage statistics for the current session:
/cost
⎿ Total duration (API): 9m 35.6s
Total duration (wall): 22m 14.9s
Total code changes: 0 lines added, 0 lines removed
Usage by model:
claude-sonnet-4: 875.5k input, 11.7k output, 714.3k cache read, 0 cache writeUsing /context Command
The /context command analyzes current context usage, showing size distribution across different context types:
> /context
⎿ Context Usage
⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ glm-4.7 · 38.1k/200.0k tokens (19.1%)
⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁
⛁ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛁ System prompt: 2.1k tokens (1.1%)
⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛁ System tools: 16.4k tokens (8.2%)
⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛁ Memory files: 3.7k tokens (1.9%)
⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛁ Messages: 15.9k tokens (7.9%)
⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ Free space: 145.9k (72.9%)
⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛝ Autocompact buffer: 16.0k tokens (8.0%)
⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶
⛶ ⛶ ⛶ ⛶ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝
Memory files · /memory
└ /Users/yangsubo/.codebuddy/CODEBUDDY.md (User): 18 tokens
└ /Users/yangsubo/CODEBUDDY.md (Project): 15 tokens
└ /Users/yangsubo/workspace/genie/CODEBUDDY.md (Project): 1.2k tokens
└ /Users/yangsubo/workspace/genie/packages/agent-cli/CODEBUDDY.md (Project): 2.5k tokens
Skills and slash commands · /skills
Project
└ release: 1.1k tokens
└ gen-drawio: 846 tokens
└ task-manager: 815 tokens
└ task-add: 730 tokens
└ task-done: 525 tokens
└ mr: 519 tokens
└ task-start: 493 tokens
└ my-task: 238 tokens
└ task-list: 233 tokens
└ security-review: 30 tokensUse /context to quickly identify what's consuming significant context space and optimize accordingly.
Multi-Scenario Model Mechanism
Different task scenarios have different model capability requirements. Simple file searches and quick queries can be completed with lightweight models; complex architectural design and multi-step reasoning require more powerful reasoning models.
By automatically selecting different models for different scenarios, you can achieve:
- Optimal effectiveness: Complex tasks use high-capability models to ensure quality
- Faster speed: Simple tasks use lightweight models for quicker responses
- Lower costs: Avoid using expensive high-end models for simple tasks
Scenario Types
| Scenario Type | Description | Typical Use Cases |
|---|---|---|
default | Default model, balanced performance and cost | General programming tasks, code writing |
lite | Lightweight fast model, low cost and high speed | File search, simple queries, quick operations |
reasoning | Reasoning-enhanced model, powerful analysis capability | Complex analysis, architectural decisions, multi-step reasoning |
Automatic Model Selection
CodeBuddy Code automatically selects appropriate scenario models based on task type. When sub-agents execute, the system automatically resolves the corresponding scenario model based on the user's currently selected main model.
For example, lightweight sub-agents like contentAnalyzer automatically use the lite model, reducing costs and improving speed while maintaining functionality.
The Task tool supports specifying scenario type via the model parameter:
default: Inherits parent model, suitable for general taskslite: Fast and low-cost, suitable for simple searches and quick file operationsreasoning: Enhanced reasoning capability, suitable for complex analysis and architectural decisions
Reducing Token Consumption
Token costs grow with context size: the larger the context CodeBuddy Code processes, the more tokens consumed. CodeBuddy Code automatically optimizes costs through Prompt caching (reducing costs of repeated content like system prompts) and auto-compaction (compressing conversation history when approaching context limits).
The following strategies help you keep context small and reduce per-message costs.
Proactively Manage Context
Use /cost to check current token usage.
- Clean up between tasks: Use
/clearto start fresh when switching to unrelated work. Stale context wastes tokens on every subsequent message. Use/renamebefore clearing so you can return later via/resume. - Add custom compaction instructions:
/compact Focus on code samples and API usagetells CodeBuddy Code what to preserve during compaction.
You can also customize compaction behavior in CODEBUDDY.md:
markdown
# Compact instructions
When you are using compact, please focus on test output and code changes.Async Compaction Strategy
The system automatically compacts when conversation history approaches context limits:
- Auto-triggered: Automatically starts compaction when context approaches limits
- Background execution: Compaction runs asynchronously in the background without blocking user operations
- Smart summarization: Preserves key information while compressing redundant content
- Seamless experience: Users don't notice it, achieving an "infinite context" experience
Key information preserved during compaction: code change records, important decision points, user explicit preferences and instructions, key context for current tasks.
Choose Appropriate Models
Select models based on task complexity. Use /model to switch models during a session, or set defaults in /config.
- Use lite for simple tasks: File search, quick queries, code formatting
- Use reasoning for complex tasks: Architectural design, performance optimization, complex debugging
- Use default for general tasks: Daily coding, feature implementation
Reduce MCP Server Overhead
Each MCP server adds tool definitions to context, even when idle. Run /mcp to see configured servers.
- Prefer CLI tools:
gh,aws,gcloudand similar tools are more context-efficient than MCP servers because they don't add persistent tool definitions. CodeBuddy Code can run CLI commands directly without extra overhead. - Disable unused servers: Run
/mcpto view and disable unused servers.
Delegate Detailed Operations to Sub-agents
Running tests, fetching documentation, or processing log files can consume significant context. Delegate these operations to sub-agents, where detailed output stays in the sub-agent's context and only summaries return to the main conversation.
Write Precise Prompts
Vague requests like "improve this codebase" trigger broad scans. Precise requests like "add input validation to the login function in auth.ts" let CodeBuddy Code work efficiently with minimal file reads.
Efficient Work Patterns for Complex Tasks
For longer or more complex work, these habits help avoid wasting tokens on wrong directions:
- Use plan mode for complex tasks: Press Shift+Tab to enter plan mode. CodeBuddy Code will explore the codebase and propose approaches for your approval, avoiding expensive rework if the initial direction is wrong.
- Correct direction early: If CodeBuddy Code starts going in the wrong direction, press Escape to stop immediately. Use
/rewindor double-tap Escape to restore conversation and code to a previous checkpoint. - Provide verification targets: Include test cases, screenshots, or expected output in your prompts. When CodeBuddy Code can self-verify its work, it can catch issues before you need to request fixes.
- Test incrementally: Write one file, test it, then continue. This catches problems early while they're still easy to fix.
Background Token Consumption
CodeBuddy Code also consumes tokens for certain background features when idle:
- Conversation summarization: Background tasks summarize previous conversations for the
--resumefeature - Prompt prediction: Predicts the most likely next prompt input based on conversation history
These background processes consume small amounts of tokens even without active interactions.
Related Documentation
- Sub-agents - Use sub-agents to isolate high-consumption operations
- MCP - Manage MCP server overhead
- Models - Learn about available model options
This document helps you understand how to effectively manage CodeBuddy Code usage costs.