Skills

36 skills are bundled with GSD. Skills are loaded automatically when relevant — the agent detects the context and activates the appropriate skill without you needing to invoke it explicitly. Detection is keyword- and context-based: phrases like “optimize my code”, “a11y audit”, or “create a workflow” each trigger specific skills.

Some skills can also be invoked directly as slash commands and accept arguments (e.g., /lint src/ --fix, /test run). Sub-skills are nested under their parent and share its detection context.

Click any card to expand for details.

accessibility

Audit and improve web accessibility following WCAG 2.1 guidelines. Use when asked to "improve accessibility", "a11y audit", "WCAG compliance", "screen reader support", "keyboard navigation", or "make accessible".

agent-browser

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.

api-design

Design or review an HTTP/REST/GraphQL API for versioning, pagination, error shapes, idempotency, auth, and evolvability. Use when asked to "design an API", "shape the endpoints", "design the schema", "add a new endpoint", "review this API", or when building/modifying a public or internal HTTP surface. Complements `design-an-interface` (which is interface-agnostic) by covering HTTP-specific concerns like status codes, cache headers, and breaking-change management.

Objective

Shape an HTTP or GraphQL API so callers get predictable, evolvable, and honest semantics. The deliverable is a concrete endpoint/schema sketch with: URL or operation names, method/verb, request shape, response shape, error shape, auth model, pagination strategy, and versioning stance. Optimize for "clients that exist in 2 years" over "client that's easy to write today".

best-practices

Apply modern web development best practices for security, compatibility, and code quality. Use when asked to "apply best practices", "security audit", "modernize code", "code quality review", or "check for vulnerabilities".

btw

Ask a quick side question about your current work without derailing the main task. Answers from existing conversation context only — no tool calls, no file reads, single concise response. Use when you need a fast answer from what is already in this session.

Objective

Answer a quick side question using only what is already present in the current conversation context. Do not read files, run commands, search, or use any tools. Give a single, concise response and return focus to the main work.

code-optimizer

Deep code optimization audit using parallel specialist agents. Each agent hunts for performance anti-patterns, inefficiencies, and suboptimal code using pattern-based detection (Grep/Glob) WITHOUT reading the full source code first — avoiding anchoring bias on existing implementations. Covers ALL optimization domains: database queries, memory leaks, algorithmic complexity, concurrency, bundle size, dead code, I/O & network, rendering/UI, data structures, error handling, caching, build config, security-performance, logging, and infrastructure. Use when asked to: "optimize my code", "find performance issues", "audit code quality", "speed up my app", "find bottlenecks", "code review for performance", "find anti-patterns", "improve code efficiency", "reduce latency", "optimize performance", "code smell detection", "find slow code", "optimize this project", "performance audit", "code optimization". Also triggers on: "optimizar codigo", "encontrar cuellos de botella", "mejorar rendimiento".

core-web-vitals

Optimize Core Web Vitals (LCP, INP, CLS) for better page experience and search ranking. Use when asked to "improve Core Web Vitals", "fix LCP", "reduce CLS", "optimize INP", "page experience optimization", or "fix layout shifts".

create-gsd-extension

Create, debug, and iterate on GSD extensions (TypeScript modules that add tools, commands, event hooks, custom UI, and providers to GSD). Use when asked to build an extension, add a tool the LLM can call, register a slash command, hook into GSD events, create custom TUI components, or modify GSD behavior. Triggers on "create extension", "build extension", "add a tool", "register command", "hook into gsd", "custom tool", "gsd plugin", "gsd extension".

create-mcp-server

Build, iterate, and evaluate Model Context Protocol (MCP) servers that expose external services as tools an LLM can call. Use when asked to "build an MCP server", "create an MCP tool", "wrap this API as MCP", "expose X to Claude", or when extending GSD with custom tool integrations. Covers research, schema/tool design, error handling, pagination, testing via MCP Inspector, and producing a 10-question eval set that proves the server actually enables real work.

Objective

Produce a high-quality MCP server that an LLM can actually use — not one that merely parses spec-compliant. Quality is measured by how well the server enables real-world task completion, which means the tool descriptions, error messages, and pagination behave under model reasoning, not just at the wire level.

create-skill

Expert guidance for creating, writing, building, and refining GSD skills. Use when working with SKILL.md files, authoring new skills, improving existing skills, or understanding skill structure and best practices.

Objective

...

create-workflow

Conversational guide for creating valid YAML workflow definitions. Use when asked to "create a workflow", "new workflow definition", "build a workflow", "workflow YAML", "define workflow steps", or "workflow from template".

debug-like-expert

Deep analysis debugging mode for complex issues. Activates methodical investigation protocol with evidence gathering, hypothesis testing, and rigorous verification. Use when standard troubleshooting fails or when issues require systematic root cause analysis.

Objective

Deep analysis debugging mode for complex issues. This skill activates methodical investigation protocols with evidence gathering, hypothesis testing, and rigorous verification when standard troubleshooting has failed.

The skill emphasizes treating code you wrote with MORE skepticism than unfamiliar code, as cognitive biases about "how it should work" can blind you to actual implementation errors. Use scientific method to systematically identify root causes rather than applying quick fixes.

decompose-into-slices

Break a plan or milestone brief into independently-grabbable vertical slices (tracer bullets). Produces slices in `M###-ROADMAP.md` by default, or GitHub issues only with explicit user confirmation. Use when asked to "break this into slices", "decompose the plan", "vertical slices", "break into issues", or when a plan is ready but needs task-level decomposition. Prefers many thin slices over few thick ones; marks dependency order explicitly.

Objective

Decompose an approved plan into the smallest useful vertical slices that each cut end-to-end through every relevant layer. Primary output is the `Slices` section of `M###-ROADMAP.md` (matching the template at `src/resources/extensions/gsd/templates/roadmap.md`). Secondary output, only with explicit confirmation, is a set of GitHub issues with blocked-by relationships wired up.

dependency-upgrade

Plan, batch, and verify dependency upgrades safely. Triages outdated packages into risk tiers, upgrades in order (dev/minor/patch first, runtime majors last), verifies each batch before moving on, and produces an auditable commit sequence. Use when asked to "upgrade deps", "bump packages", "update node_modules", "fix vulnerabilities", "upgrade React/Node/TypeScript", or after `/gsd start dep-upgrade`. Complements the dep-upgrade workflow template with execution-level rigor.

Objective

Turn a pile of outdated packages into a series of small, verifiable upgrades with clean commits. The deliverable is an ordered upgrade plan, executed with verification between batches, and a summary that flags anything risky for follow-up. No big-bang upgrades.

design-an-interface

Produce 3+ radically different designs for a module, API, or interface, compare them in prose, and synthesize a recommendation. Use when asked to "design an interface", "shape this API", "design it twice", "explore module boundaries", or when planning a new deep module and the first idea is unlikely to be the best. Based on "Design It Twice" from A Philosophy of Software Design — the value is the contrast, not the first draft.

Objective

Generate at least three radically different interface designs for a single module or API, present them sequentially, compare them honestly, and recommend one (or a hybrid). The goal is to surface the design space — not to pick fast. A small interface hiding significant complexity is a deep module; a large interface with thin implementation is a shallow one. Optimize for the former.

forensics

Post-mortem a failed GSD auto-mode run. Traces from symptom to root cause using `.gsd/activity/*.jsonl`, `.gsd/journal/YYYY-MM-DD.jsonl`, `.gsd/metrics.json`, and `.gsd/auto.lock`. Produces a filing-ready bug report with file:line references and a concrete fix suggestion. Use when asked to "forensics", "post-mortem", "why did auto-mode fail", "trace the stuck loop", "debug the crash", after `/gsd forensics` is invoked, or when a session ended in an unexpected terminal state. Reads existing artifacts — does NOT re-run anything.

Objective

Turn scattered GSD runtime artifacts into one coherent cause chain. The deliverable is a GitHub-issue-ready report that names the file and line where the bug lives, cites the evidence, and proposes a fix. Forensics is archaeology, not re-run — no modifying state, no triggering commands, just reading the paper trail.

frontend-design

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, artifacts, posters, or applications (examples include websites, landing pages, dashboards, React components, HTML/CSS layouts, or when styling/beautifying any web UI). Generates creative, polished code and UI design that avoids generic AI aesthetics.

github-workflows

Work with GitHub Actions CI/CD workflows - read live syntax, monitor runs, and debug failures. Use when writing, running, or debugging GitHub Actions workflows.

gh Sub-skill of github-workflows

Install and configure the GitHub CLI (gh) for AI agent environments where gh may not be pre-installed and git remotes use local proxies instead of github.com. Provides auto-install script with SHA256 verification and GITHUB_TOKEN auth with anonymous fallback. Use when gh command not found, shutil.which("gh") returns None, need GitHub API access (issues, PRs, releases, workflow runs), or repository operations fail with "failed to determine base repo" error. Documents required -R flag for all gh commands in proxy environments. Includes project management: GitHub Projects V2 (gh project), milestones (REST API), issue stories (lifecycle and templates), and label taxonomy management.

grill-me

Relentless sequential interview that stress-tests a plan or design until every decision branch is resolved. Use when the user wants to "grill me", "stress-test the plan", "interrogate my design", "resolve the decision tree", or whenever a plan feels hand-wavy, under-specified, or carries hidden coupling that planning phases must surface before execution. Pairs with the discuss phase and blocks execution until alignment is reached.

Objective

Interview the user one question at a time until every material branch of the decision tree is resolved and you both share one mental model of the plan. Output is not a document — it is alignment. Surface hidden assumptions, kill fuzzy language, and walk the dependencies between decisions instead of asking everything in parallel.

handoff

Prepare a clean cross-session handoff so the next agent (or you tomorrow) can pick up exactly where you left off. Writes a focused `continue.md` in the active slice directory and ensures `STATE.md` + summary artifacts are current. Use when asked to "hand off", "prepare handoff", "pause work", "bookmark this", "I'll come back to this later", before running out of context budget, or at the end of a long session with unfinished work. Closes the v1 `/gsd-pause-work` parity gap.

Objective

Leave the project in a state where a fresh agent with no memory of this session can read two or three files and be productive within one minute. The deliverable is `continue.md` in the active slice directory plus up-to-date summary artifacts — not a chat recap.

lint

Lint and format code. Auto-detects ESLint, Biome, Prettier, or language-native formatters and runs them with auto-fix. Reports remaining issues with actionable suggestions.

Objective

Lint and format code in the current project. Auto-detect the project's linter and formatter toolchain, run them against the target files, and report results grouped by severity with actionable fix suggestions.

Arguments

This skill accepts optional arguments after `/lint`:

- **No arguments**: Lint only files changed in the current working tree (`git diff --name-only` and `git diff --cached --name-only`).
- **A file or directory path**: Lint only that specific path (e.g., `/lint src/utils`).
- **`--fix`**: Automatically apply safe fixes. Can be combined with a path (e.g., `/lint src/ --fix`).
- **`--fix` without a path**: Auto-fix changed files only.

Parse the arguments before proceeding. If `--fix` is present, set fix mode. If a non-flag argument is present, treat it as the target path.

Detection

Auto-detect the project's linter and formatter by checking configuration files in the project root. Check in this order and use the **first match found** for each category (linter vs. formatter). A project may have both a linter and a formatter.

**JavaScript/TypeScript Linters:**

1. **Biome** — Look for `biome.json` or `biome.jsonc` in the project root.
   - Lint command: `npx @biomejs/biome check .` (or `--apply` with `--fix`)
   - Format command: `npx @biomejs/biome format .` (or `--write` with `--fix`)
   - Biome handles both linting and formatting. No need for a separate formatter if Biome is detected.

2. **ESLint** — Look for `.eslintrc`, `.eslintrc.*` (js, cjs, json, yml, yaml), `eslint.config.*` (js, mjs, cjs, ts, mts, cts), or an `"eslintConfig"` key in `package.json`.
   - Lint command: `npx eslint .` (or `--fix` with `--fix`)
   - Check `package.json` for the installed version. ESLint 9+ uses flat config (`eslint.config.*`).

**JavaScript/TypeScript Formatters (only if Biome was NOT detected):**

3. **Prettier** — Look for `.prettierrc`, `.prettierrc.*`, `prettier.config.*`, or a `"prettier"` key in `package.json`.
   - Format check: `npx prettier --check .`
   - Format fix: `npx prettier --write .`

**Rust:**

4. **rustfmt** — Look for `rustfmt.toml` or `.rustfmt.toml`, or `Cargo.toml` in the project root.
   - Format check: `cargo fmt -- --check`
   - Format fix: `cargo fmt`
   - Lint: `cargo clippy` (if available)

**Go:**

5. **Go tools** — Look for `go.mod` in the project root.
   - Format check: `gofmt -l .`
   - Format fix: `gofmt -w .`
   - Lint: `golangci-lint run` (if installed), otherwise `go vet ./...`

**Python:**

6. **Ruff** — Look for `ruff.toml` or a `[tool.ruff]` section in `pyproject.toml`.
   - Lint command: `ruff check .` (or `--fix` with `--fix`)
   - Format command: `ruff format .` (or `--check` without `--fix`)

7. **Black** — Look for a `[tool.black]` section in `pyproject.toml`, or `black` in requirements files.
   - Format check: `black --check .`
   - Format fix: `black .`

If no linter or formatter is detected, inform the user and suggest common options for their project type based on the files present.

make-interfaces-feel-better

Design engineering principles for making interfaces feel polished. Use when building UI components, reviewing frontend code, implementing animations, hover states, shadows, borders, typography, micro-interactions, enter/exit animations, or any visual detail work. Triggers on UI polish, design details, "make it feel better", "feels off", stagger animations, border radius, optical alignment, font smoothing, tabular numbers, image outlines, box shadows.

observability

Add agent-first observability to code — structured logs, health endpoints, failure-state persistence, and explicit failure modes — so the next agent hitting a problem at 3am has the signals it needs to diagnose. Use when asked to "add logging", "add observability", "add metrics", "debug later", "make this observable", or when building/refactoring a subsystem that will run unattended (auto-mode engine, background jobs, servers, watchers). Operationalizes VISION.md's "agent-first observability" principle.

Objective

Instrument code so that a cold-start agent can understand what happened by reading signals, not by rerunning with extra logging. The deliverable is a set of specific instrumentation additions: structured logs at decision points, health/status surfaces for long-running processes, persisted failure state, and explicit failure modes that don't get swallowed.

react-best-practices

React and Next.js performance optimization guidelines from Vercel Engineering. This skill should be used when writing, reviewing, or refactoring React/Next.js code to ensure optimal performance patterns. Triggers on tasks involving React components, Next.js pages, data fetching, bundle optimization, or performance improvements.

review

Review code changes for security, performance, bugs, and quality. Reviews staged changes, unstaged changes, specific commits, or PR-ready diffs.

Objective

Review code changes and provide structured feedback covering security, performance, bug risks, code quality, and test coverage gaps. This skill analyzes diffs and surrounding context to catch issues before they reach production.

security-review

Threat-model-driven security review of a change, feature, or subsystem. Runs a STRIDE-style pass (Spoofing, Tampering, Repudiation, Info disclosure, Denial of service, Elevation of privilege), examines the actual code, and produces a filing-ready report with severity, exploit scenario, and concrete remediation. Use when asked to "security review", "threat model", "check for vulnerabilities", "audit this for security", "secure this", or before shipping any change that touches auth, input handling, data access, or external surfaces.

Objective

Produce a security review that names specific exploit paths through the actual code — not a generic checklist. The deliverable is a prioritized list of findings, each with: where the issue lives, the threat category, a concrete exploit scenario, severity, and a remediation the caller can implement. Read-only: does not modify code.

spike-wrap-up

Package findings from a completed spike into a durable, project-local skill that auto-loads on future similar work. Reads the most recent `.gsd/workflows/spikes/` directory, interviews the user briefly on what's reusable, then writes `.claude/skills/<name>/SKILL.md`. Use when asked to "wrap up the spike", "package this as a skill", "make this reusable", "turn findings into a skill", or at the end of the synthesize phase of `/gsd start spike`. Closes the parity gap with GSD v1's `/gsd-spike-wrap-up`.

Objective

Convert the output of a research spike (`SCOPE.md`, `research/*.md`, `RECOMMENDATION.md`) into a project-local skill under `.claude/skills/` so that the next time a similar task comes up, the agent loads the skill automatically. This is how throwaway spikes become durable capital.

tdd

Test-driven development with red-green-refactor loops built around vertical slices (tracer bullets), not horizontal layers. Use when asked to "use TDD", "write test-first", "red-green-refactor", "build this with tests", or whenever a feature has a clear observable contract and would benefit from tests that outlive refactors. Complements the bundled test and add-tests skills — use this for the discipline, use those for the mechanics.

Objective

Drive feature implementation through one red-green-refactor cycle per vertical slice. Each cycle produces a single failing test that pins one observable behavior, then the minimal code that makes it pass, then refactoring while GREEN. Never refactor while RED. Never write all tests up front.

test

Generate or run tests. Auto-detects test framework, generates comprehensive tests for source files, or runs existing test suites with failure analysis.

Objective

Generate or run tests for the current project. This skill auto-detects the test framework in use, generates comprehensive tests for source files, or runs existing test suites and analyzes failures.

Accepts optional arguments:
- A file path: generate tests for that source file
- `run`: run the existing test suite and analyze results
- No arguments: suggest what to test based on recent changes

userinterface-wiki

UI/UX best practices for web interfaces. Use when reviewing animations, CSS, audio, typography, UX patterns, prefetching, or icon implementations. Covers 11 categories from animation principles to typography. Outputs file:line findings.

verify-before-complete

Block completion claims until verification evidence has been produced in the current message. Use before marking a task/slice/milestone complete, before creating a commit or PR, before saying "it works" or "tests pass", and any time you are about to claim work is done. The rule is: evidence before claims, always — running the verification must happen now, not "earlier in the session". Fresh output or no claim.

Objective

Enforce the GSD "work is not done when code compiles; work is done when verification passes" contract. A completion claim without verification output generated in the current message is a broken claim. This skill is the gate.

web-design-guidelines

Review UI code for Web Interface Guidelines compliance. Use when asked to "review my UI", "check accessibility", "audit design", "review UX", or "check my site against best practices".

web-quality-audit

Comprehensive web quality audit covering performance, accessibility, SEO, and best practices. Use when asked to "audit my site", "review web quality", "run lighthouse audit", "check page quality", or "optimize my website".

write-docs

Collaborative document authoring workflow for proposals, technical specs, decision docs, README sections, ADRs, and long-form prose that must work for fresh readers. Use when asked to "write the docs", "draft a proposal", "write a spec", "write an RFC", "write the README", or when a document needs to be understandable by someone without this session's context. Three stages: gather context, iterate on structure, reader-test for a stranger.

Objective

Produce documentation that works for a reader landing cold. Not a summary of what was built, not a dump of the conversation, but a document that transfers intent to someone who wasn't here. Run it through three stages — Context → Refine → Reader-Test — and don't ship until a hypothetical reader could act on it.

write-milestone-brief

Synthesize the current conversation into a milestone brief (PRD). Writes to `M###-CONTEXT.md` by default, or files a GitHub issue only with explicit user confirmation. Use when asked to "turn this into a PRD", "draft a milestone brief", "capture this context", "write it up", or when enough has been discussed to commit the plan to paper. Does not interview — it synthesizes what is already known.

Objective

Take everything established in the current conversation (plus repo reality) and produce a milestone brief that a future agent can execute from with zero additional context. The output is a populated `M###-CONTEXT.md`, matching the template at `src/resources/extensions/gsd/templates/context.md`. Optionally, with explicit confirmation, also a GitHub issue.

Previous
Commands Next
Extensions