Skip to content

Control Your Costs

See also: Solo Guide §6 — Controlling Costs for the strategic framing: flat-rate vs pay-per-use, what drives the bill, and typical cost patterns.


Set a maximum cumulative spend for a project in your preferences:

---
version: 1
budget_ceiling: 50.00
budget_enforcement: pause
---

Three enforcement modes are available:

ModeBehaviour
warnLog a warning, continue executing
pausePause auto mode, wait for user action (recommended)
haltStop auto mode entirely

pause is the right default — it gives you visibility at the moment the ceiling is hit and lets you decide whether to increase it, switch to a cheaper token profile, or stop.

Dashboard: Ctrl+Alt+G or /gsd status shows real-time cost breakdown.

Export: /gsd export generates a full retrospective with cost by phase, slice, and model.

Aggregations available in the dashboard:

  • By phase (research, planning, execution, completion, reassessment)
  • By slice (M001/S01, M001/S02, …)
  • By model (which models consumed the most budget)
  • Project totals

All metrics are stored in .gsd/metrics.json and survive across sessions. Every unit’s input tokens, output tokens, cache read/write, cost, duration, and tool call count are captured automatically.

Once at least two slices have completed, GSD projects remaining cost:

Projected remaining: $12.40 ($6.20/slice avg × 2 remaining)

If the budget ceiling has been reached, a warning is appended to the projection.


The token_profile preference coordinates model selection, phase skipping, and context compression in one setting. Set it per-project via /gsd prefs project:

---
version: 1
token_profile: balanced
---

budget — Maximum Savings (40–60% reduction)

Use when the work is well-understood and mostly mechanical. Cheaper models, skipped optional phases, compressed context.

DimensionSetting
Planning modelSonnet
Execution modelSonnet
Simple task modelHaiku
Completion modelHaiku
Milestone researchSkipped
Slice researchSkipped
Roadmap reassessmentSkipped
Context inline levelMinimal

Best for: prototyping, small projects, well-understood codebases, documentation runs.

balanced — Smart Defaults (default)

The right choice for most active development. Keeps important phases, skips ones with diminishing returns, uses standard context compression.

DimensionSetting
Planning modelUser’s default
Execution modelUser’s default
Completion modelUser’s default
Slice researchSkipped
Roadmap reassessmentRuns
Context inline levelStandard

Best for: most projects and day-to-day development.

quality — Full Context (no compression)

Every phase runs. Every context artefact is inlined. No shortcuts. Use for architectural decisions, greenfield projects, and critical production work where getting it right first time is worth the cost premium.

Each profile maps to an inline level that controls what gets pre-loaded into dispatch prompts:

ProfileInline LevelWhat’s Included
budgetminimalTask plan, essential prior summaries (truncated). Drops decisions register, requirements, UAT template.
balancedstandardTask plan, prior summaries, slice plan, roadmap excerpt. Drops some supplementary templates.
qualityfullEverything — all plans, summaries, decisions, requirements, templates, and root files.

At budget and balanced levels, GSD also applies deterministic prompt compression (heuristic text compression, deduplication, and low-information boilerplate removal) before falling back to section-boundary truncation. The budget profile additionally uses TF-IDF semantic chunking for large files (> 3 KB) to include only the relevant portions.

Explicit phases preferences always override profile defaults:

---
version: 1
token_profile: budget
phases:
skip_research: false # override: run research even on budget profile
models:
planning: claude-opus-4-6 # override: use Opus for planning despite budget profile
---

Dynamic routing automatically selects cheaper models for simple work and reserves expensive models for complex tasks. This can reduce token consumption by 20–50% on capped plans.

Dynamic routing is off by default. Enable it in preferences:

---
version: 1
dynamic_routing:
enabled: true
---
dynamic_routing:
enabled: true
tier_models: # explicit model per tier (optional)
light: claude-haiku-4-5
standard: claude-sonnet-4-6
heavy: claude-opus-4-6
escalate_on_failure: true # bump tier on task failure (default: true)
budget_pressure: true # auto-downgrade near budget ceiling (default: true)
cross_provider: true # consider models from other providers (default: true)
hooks: true # apply routing to post-unit hooks (default: true)

When tier_models is omitted, the router uses a built-in capability mapping:

  • Light: claude-haiku-4-5, gpt-4o-mini, gemini-2.0-flash
  • Standard: claude-sonnet-4-6, gpt-4o, gemini-2.5-pro
  • Heavy: claude-opus-4-6, gpt-4.5-preview, gemini-2.5-pro

Units are classified by pure heuristics — no LLM calls, sub-millisecond:

SignalSimple → LightComplex → Heavy
Step count≤ 3≥ 8
File count≤ 3≥ 8
Description length< 500 chars> 2000 chars
Code blocks≥ 5
Complexity keywordsNonePresent

Complexity keywords: research, investigate, refactor, migrate, integrate, complex, architect, redesign, security, performance, concurrent, parallel, distributed, backward compat

Non-task units have built-in tier assignments:

Unit TypeDefault Tier
complete-slice, run-uatLight
research-*, plan-*, complete-milestoneStandard
execute-taskStandard (upgraded by task analysis)
replan-slice, reassess-roadmapHeavy
hook/*Light

When a task fails at a given tier, the router escalates on retry: Light → Standard → Heavy. This prevents cheap models from burning retries on work that needs more reasoning.

Token profiles and dynamic routing are complementary — profiles set the baseline models, routing further optimises within those baselines. A budget profile + dynamic routing provides maximum cost reduction. A quality profile + dynamic routing still routes within quality-tier models — the profile is the ceiling, not overridden by the router.


When approaching the budget ceiling, the complexity router automatically downgrades model assignments before you hit the wall:

Budget UsedEffect
< 50%No adjustment
50–75%Standard → Light
75–90%More aggressive downgrading
> 90%Nearly everything → Light; only Heavy stays at Standard

This graduated response spreads the budget across remaining work rather than exhausting it early on complex tasks.

Add a second safety valve — if context window usage for a single task exceeds a threshold, auto mode pauses before dispatch:

---
version: 1
context_pause_threshold: 80 # pause if context exceeds 80% of window
---

Useful for catching unexpectedly large tasks before they burn significant tokens.

GSD tracks the success and failure of each tier assignment and adjusts future classifications. This is automatic and persists in .gsd/routing-history.json.

If a tier’s failure rate exceeds 20% for a given pattern, future classifications for that pattern are bumped up one tier. Use /gsd rate to submit manual feedback:

/gsd rate over # model was overpowered — encourage cheaper next time
/gsd rate ok # model was appropriate — no adjustment
/gsd rate under # model was too weak — encourage stronger next time

Feedback signals are weighted 2× compared to automatic outcomes. Requires dynamic routing to be active.

The metrics ledger tracks cacheHitRate per unit (percentage of input tokens served from cache) and provides an aggregate cache hit rate for the session. Check it via /gsd status → Metrics tab.


---
version: 1
token_profile: budget
budget_ceiling: 25.00
dynamic_routing:
enabled: true
models:
execution_simple: claude-haiku-4-5-20250414
---
---
version: 1
token_profile: balanced
dynamic_routing:
enabled: true
models:
planning:
model: claude-opus-4-6
fallbacks:
- openrouter/z-ai/glm-5
execution: claude-sonnet-4-6
---
---
version: 1
token_profile: quality
models:
planning: claude-opus-4-6
execution: claude-opus-4-6
---

  • Start with balanced and a generous budget_ceiling to establish baseline costs
  • Check /gsd status after a few slices to see per-slice cost averages
  • Switch to budget for well-understood, repetitive, or documentation work
  • Use quality only when architectural decisions are being made
  • Enable dynamic_routing for automatic model downgrading on simple tasks
  • Use /gsd visualize → Metrics tab to see where your budget is going
  • Run /gsd export after a milestone for a full cost breakdown by phase and slice