Control Your Costs
See also: Solo Guide §6 — Controlling Costs for the strategic framing: flat-rate vs pay-per-use, what drives the bill, and typical cost patterns.
Setting Budgets & Tracking Costs
Section titled “Setting Budgets & Tracking Costs”Budget Ceiling
Section titled “Budget Ceiling”Set a maximum cumulative spend for a project in your preferences:
---version: 1budget_ceiling: 50.00budget_enforcement: pause---Three enforcement modes are available:
| Mode | Behaviour |
|---|---|
warn | Log a warning, continue executing |
pause | Pause auto mode, wait for user action (recommended) |
halt | Stop auto mode entirely |
pause is the right default — it gives you visibility at the moment the ceiling is hit and lets you decide whether to increase it, switch to a cheaper token profile, or stop.
Viewing Costs
Section titled “Viewing Costs”Dashboard: Ctrl+Alt+G or /gsd status shows real-time cost breakdown.
Export: /gsd export generates a full retrospective with cost by phase, slice, and model.
Aggregations available in the dashboard:
- By phase (research, planning, execution, completion, reassessment)
- By slice (M001/S01, M001/S02, …)
- By model (which models consumed the most budget)
- Project totals
All metrics are stored in .gsd/metrics.json and survive across sessions. Every unit’s input tokens, output tokens, cache read/write, cost, duration, and tool call count are captured automatically.
Cost Projections
Section titled “Cost Projections”Once at least two slices have completed, GSD projects remaining cost:
Projected remaining: $12.40 ($6.20/slice avg × 2 remaining)If the budget ceiling has been reached, a warning is appended to the projection.
Token Optimisation Strategies
Section titled “Token Optimisation Strategies”The token_profile preference coordinates model selection, phase skipping, and context compression in one setting. Set it per-project via /gsd prefs project:
---version: 1token_profile: balanced---The Three Profiles
Section titled “The Three Profiles”budget — Maximum Savings (40–60% reduction)
Use when the work is well-understood and mostly mechanical. Cheaper models, skipped optional phases, compressed context.
| Dimension | Setting |
|---|---|
| Planning model | Sonnet |
| Execution model | Sonnet |
| Simple task model | Haiku |
| Completion model | Haiku |
| Milestone research | Skipped |
| Slice research | Skipped |
| Roadmap reassessment | Skipped |
| Context inline level | Minimal |
Best for: prototyping, small projects, well-understood codebases, documentation runs.
balanced — Smart Defaults (default)
The right choice for most active development. Keeps important phases, skips ones with diminishing returns, uses standard context compression.
| Dimension | Setting |
|---|---|
| Planning model | User’s default |
| Execution model | User’s default |
| Completion model | User’s default |
| Slice research | Skipped |
| Roadmap reassessment | Runs |
| Context inline level | Standard |
Best for: most projects and day-to-day development.
quality — Full Context (no compression)
Every phase runs. Every context artefact is inlined. No shortcuts. Use for architectural decisions, greenfield projects, and critical production work where getting it right first time is worth the cost premium.
Context Compression
Section titled “Context Compression”Each profile maps to an inline level that controls what gets pre-loaded into dispatch prompts:
| Profile | Inline Level | What’s Included |
|---|---|---|
budget | minimal | Task plan, essential prior summaries (truncated). Drops decisions register, requirements, UAT template. |
balanced | standard | Task plan, prior summaries, slice plan, roadmap excerpt. Drops some supplementary templates. |
quality | full | Everything — all plans, summaries, decisions, requirements, templates, and root files. |
At budget and balanced levels, GSD also applies deterministic prompt compression (heuristic text compression, deduplication, and low-information boilerplate removal) before falling back to section-boundary truncation. The budget profile additionally uses TF-IDF semantic chunking for large files (> 3 KB) to include only the relevant portions.
Per-Phase Overrides
Section titled “Per-Phase Overrides”Explicit phases preferences always override profile defaults:
---version: 1token_profile: budgetphases: skip_research: false # override: run research even on budget profilemodels: planning: claude-opus-4-6 # override: use Opus for planning despite budget profile---Dynamic Model Routing
Section titled “Dynamic Model Routing”Dynamic routing automatically selects cheaper models for simple work and reserves expensive models for complex tasks. This can reduce token consumption by 20–50% on capped plans.
Enabling
Section titled “Enabling”Dynamic routing is off by default. Enable it in preferences:
---version: 1dynamic_routing: enabled: true---Configuration
Section titled “Configuration”dynamic_routing: enabled: true tier_models: # explicit model per tier (optional) light: claude-haiku-4-5 standard: claude-sonnet-4-6 heavy: claude-opus-4-6 escalate_on_failure: true # bump tier on task failure (default: true) budget_pressure: true # auto-downgrade near budget ceiling (default: true) cross_provider: true # consider models from other providers (default: true) hooks: true # apply routing to post-unit hooks (default: true)When tier_models is omitted, the router uses a built-in capability mapping:
- Light:
claude-haiku-4-5,gpt-4o-mini,gemini-2.0-flash - Standard:
claude-sonnet-4-6,gpt-4o,gemini-2.5-pro - Heavy:
claude-opus-4-6,gpt-4.5-preview,gemini-2.5-pro
How Classification Works
Section titled “How Classification Works”Units are classified by pure heuristics — no LLM calls, sub-millisecond:
| Signal | Simple → Light | Complex → Heavy |
|---|---|---|
| Step count | ≤ 3 | ≥ 8 |
| File count | ≤ 3 | ≥ 8 |
| Description length | < 500 chars | > 2000 chars |
| Code blocks | — | ≥ 5 |
| Complexity keywords | None | Present |
Complexity keywords: research, investigate, refactor, migrate, integrate, complex, architect, redesign, security, performance, concurrent, parallel, distributed, backward compat
Non-task units have built-in tier assignments:
| Unit Type | Default Tier |
|---|---|
complete-slice, run-uat | Light |
research-*, plan-*, complete-milestone | Standard |
execute-task | Standard (upgraded by task analysis) |
replan-slice, reassess-roadmap | Heavy |
hook/* | Light |
escalate_on_failure
Section titled “escalate_on_failure”When a task fails at a given tier, the router escalates on retry: Light → Standard → Heavy. This prevents cheap models from burning retries on work that needs more reasoning.
Interaction with Token Profiles
Section titled “Interaction with Token Profiles”Token profiles and dynamic routing are complementary — profiles set the baseline models, routing further optimises within those baselines. A budget profile + dynamic routing provides maximum cost reduction. A quality profile + dynamic routing still routes within quality-tier models — the profile is the ceiling, not overridden by the router.
Monitoring & Alerts
Section titled “Monitoring & Alerts”Budget Pressure Behaviour
Section titled “Budget Pressure Behaviour”When approaching the budget ceiling, the complexity router automatically downgrades model assignments before you hit the wall:
| Budget Used | Effect |
|---|---|
| < 50% | No adjustment |
| 50–75% | Standard → Light |
| 75–90% | More aggressive downgrading |
| > 90% | Nearly everything → Light; only Heavy stays at Standard |
This graduated response spreads the budget across remaining work rather than exhausting it early on complex tasks.
Context Pause Threshold
Section titled “Context Pause Threshold”Add a second safety valve — if context window usage for a single task exceeds a threshold, auto mode pauses before dispatch:
---version: 1context_pause_threshold: 80 # pause if context exceeds 80% of window---Useful for catching unexpectedly large tasks before they burn significant tokens.
Adaptive Learning (Routing History)
Section titled “Adaptive Learning (Routing History)”GSD tracks the success and failure of each tier assignment and adjusts future classifications. This is automatic and persists in .gsd/routing-history.json.
If a tier’s failure rate exceeds 20% for a given pattern, future classifications for that pattern are bumped up one tier. Use /gsd rate to submit manual feedback:
/gsd rate over # model was overpowered — encourage cheaper next time/gsd rate ok # model was appropriate — no adjustment/gsd rate under # model was too weak — encourage stronger next timeFeedback signals are weighted 2× compared to automatic outcomes. Requires dynamic routing to be active.
Cache Hit Rate
Section titled “Cache Hit Rate”The metrics ledger tracks cacheHitRate per unit (percentage of input tokens served from cache) and provides an aggregate cache hit rate for the session. Check it via /gsd status → Metrics tab.
Configuration Examples
Section titled “Configuration Examples”Cost-Optimised Setup
Section titled “Cost-Optimised Setup”---version: 1token_profile: budgetbudget_ceiling: 25.00dynamic_routing: enabled: truemodels: execution_simple: claude-haiku-4-5-20250414---Balanced with Custom Models
Section titled “Balanced with Custom Models”---version: 1token_profile: balanceddynamic_routing: enabled: truemodels: planning: model: claude-opus-4-6 fallbacks: - openrouter/z-ai/glm-5 execution: claude-sonnet-4-6---Full Quality for Critical Work
Section titled “Full Quality for Critical Work”---version: 1token_profile: qualitymodels: planning: claude-opus-4-6 execution: claude-opus-4-6---Quick Tips
Section titled “Quick Tips”- Start with
balancedand a generousbudget_ceilingto establish baseline costs - Check
/gsd statusafter a few slices to see per-slice cost averages - Switch to
budgetfor well-understood, repetitive, or documentation work - Use
qualityonly when architectural decisions are being made - Enable
dynamic_routingfor automatic model downgrading on simple tasks - Use
/gsd visualize→ Metrics tab to see where your budget is going - Run
/gsd exportafter a milestone for a full cost breakdown by phase and slice