Skip to content

Cost Management

Optimize costs through multi-scenario model selection and asynchronous compression strategies, achieving faster response times and lower usage costs while maintaining effectiveness.

CodeBuddy Code consumes tokens with each interaction. Costs vary based on codebase size, query complexity, and conversation length. This document covers how to track costs, multi-scenario model mechanisms, and how to reduce token consumption.

Tracking Costs

Using /cost Command

The /cost command provides detailed token usage statistics for the current session:

/cost
  ⎿ Total duration (API):  9m 35.6s
    Total duration (wall): 22m 14.9s
    Total code changes:    0 lines added, 0 lines removed
    Usage by model:
         claude-sonnet-4:  875.5k input, 11.7k output, 714.3k cache read, 0 cache write

Using /context Command

The /context command analyzes current context usage, showing size distribution across different context types:

> /context 
  ⎿  Context Usage
     ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁    glm-4.7 · 38.1k/200.0k tokens (19.1%)
     ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁
     ⛁ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶    ⛁ System prompt: 2.1k tokens (1.1%)
     ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶    ⛁ System tools: 16.4k tokens (8.2%)
     ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶    ⛁ Memory files: 3.7k tokens (1.9%)
     ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶    ⛁ Messages: 15.9k tokens (7.9%)
     ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶    ⛶ Free space: 145.9k (72.9%)
     ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶    ⛝ Autocompact buffer: 16.0k tokens (8.0%)
     ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶
     ⛶ ⛶ ⛶ ⛶ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝

     Memory files · /memory
     └ /Users/yangsubo/.codebuddy/CODEBUDDY.md (User): 18 tokens
     └ /Users/yangsubo/CODEBUDDY.md (Project): 15 tokens
     └ /Users/yangsubo/workspace/genie/CODEBUDDY.md (Project): 1.2k tokens
     └ /Users/yangsubo/workspace/genie/packages/agent-cli/CODEBUDDY.md (Project): 2.5k tokens

     Skills and slash commands · /skills

     Project
     └ release: 1.1k tokens
     └ gen-drawio: 846 tokens
     └ task-manager: 815 tokens
     └ task-add: 730 tokens
     └ task-done: 525 tokens
     └ mr: 519 tokens
     └ task-start: 493 tokens
     └ my-task: 238 tokens
     └ task-list: 233 tokens
     └ security-review: 30 tokens

Use /context to quickly identify what's consuming significant context space and optimize accordingly.

Multi-Scenario Model Mechanism

Different task scenarios have different model capability requirements. Simple file searches and quick queries can be completed with lightweight models; complex architectural design and multi-step reasoning require more powerful reasoning models.

By automatically selecting different models for different scenarios, you can achieve:

  • Optimal effectiveness: Complex tasks use high-capability models to ensure quality
  • Faster speed: Simple tasks use lightweight models for quicker responses
  • Lower costs: Avoid using expensive high-end models for simple tasks

Scenario Types

Scenario TypeDescriptionTypical Use Cases
defaultDefault model, balanced performance and costGeneral programming tasks, code writing
liteLightweight fast model, low cost and high speedFile search, simple queries, quick operations
reasoningReasoning-enhanced model, powerful analysis capabilityComplex analysis, architectural decisions, multi-step reasoning

Automatic Model Selection

CodeBuddy Code automatically selects appropriate scenario models based on task type. When sub-agents execute, the system automatically resolves the corresponding scenario model based on the user's currently selected main model.

For example, lightweight sub-agents like contentAnalyzer automatically use the lite model, reducing costs and improving speed while maintaining functionality.

The Task tool supports specifying scenario type via the model parameter:

  • default: Inherits parent model, suitable for general tasks
  • lite: Fast and low-cost, suitable for simple searches and quick file operations
  • reasoning: Enhanced reasoning capability, suitable for complex analysis and architectural decisions

Reducing Token Consumption

Token costs grow with context size: the larger the context CodeBuddy Code processes, the more tokens consumed. CodeBuddy Code automatically optimizes costs through Prompt caching (reducing costs of repeated content like system prompts) and auto-compaction (compressing conversation history when approaching context limits).

The following strategies help you keep context small and reduce per-message costs.

Proactively Manage Context

Use /cost to check current token usage.

  • Clean up between tasks: Use /clear to start fresh when switching to unrelated work. Stale context wastes tokens on every subsequent message. Use /rename before clearing so you can return later via /resume.
  • Add custom compaction instructions: /compact Focus on code samples and API usage tells CodeBuddy Code what to preserve during compaction.

You can also customize compaction behavior in CODEBUDDY.md:

markdown
# Compact instructions

When you are using compact, please focus on test output and code changes.

Async Compaction Strategy

The system automatically compacts when conversation history approaches context limits:

  • Auto-triggered: Automatically starts compaction when context approaches limits
  • Background execution: Compaction runs asynchronously in the background without blocking user operations
  • Smart summarization: Preserves key information while compressing redundant content
  • Seamless experience: Users don't notice it, achieving an "infinite context" experience

Key information preserved during compaction: code change records, important decision points, user explicit preferences and instructions, key context for current tasks.

Choose Appropriate Models

Select models based on task complexity. Use /model to switch models during a session, or set defaults in /config.

  • Use lite for simple tasks: File search, quick queries, code formatting
  • Use reasoning for complex tasks: Architectural design, performance optimization, complex debugging
  • Use default for general tasks: Daily coding, feature implementation

Reduce MCP Server Overhead

Each MCP server adds tool definitions to context, even when idle. Run /mcp to see configured servers.

  • Prefer CLI tools: gh, aws, gcloud and similar tools are more context-efficient than MCP servers because they don't add persistent tool definitions. CodeBuddy Code can run CLI commands directly without extra overhead.
  • Disable unused servers: Run /mcp to view and disable unused servers.

Delegate Detailed Operations to Sub-agents

Running tests, fetching documentation, or processing log files can consume significant context. Delegate these operations to sub-agents, where detailed output stays in the sub-agent's context and only summaries return to the main conversation.

Write Precise Prompts

Vague requests like "improve this codebase" trigger broad scans. Precise requests like "add input validation to the login function in auth.ts" let CodeBuddy Code work efficiently with minimal file reads.

Efficient Work Patterns for Complex Tasks

For longer or more complex work, these habits help avoid wasting tokens on wrong directions:

  • Use plan mode for complex tasks: Press Shift+Tab to enter plan mode. CodeBuddy Code will explore the codebase and propose approaches for your approval, avoiding expensive rework if the initial direction is wrong.
  • Correct direction early: If CodeBuddy Code starts going in the wrong direction, press Escape to stop immediately. Use /rewind or double-tap Escape to restore conversation and code to a previous checkpoint.
  • Provide verification targets: Include test cases, screenshots, or expected output in your prompts. When CodeBuddy Code can self-verify its work, it can catch issues before you need to request fixes.
  • Test incrementally: Write one file, test it, then continue. This catches problems early while they're still easy to fix.

Background Token Consumption

CodeBuddy Code also consumes tokens for certain background features when idle:

  • Conversation summarization: Background tasks summarize previous conversations for the --resume feature
  • Prompt prediction: Predicts the most likely next prompt input based on conversation history

These background processes consume small amounts of tokens even without active interactions.

  • Sub-agents - Use sub-agents to isolate high-consumption operations
  • MCP - Manage MCP server overhead
  • Models - Learn about available model options

This document helps you understand how to effectively manage CodeBuddy Code usage costs.