Recipe: Handle UAT Failures
When to Use This
Section titled “When to Use This”You’re running GSD auto-mode and a slice completes all its tasks, but the UAT (User Acceptance Test) step fails or surfaces checks requiring human follow-up. Auto-mode pauses and shows the failure details — it doesn’t automatically replan or retry. This recipe walks through how to read the failure, understand your options, and get the pipeline moving again.
This covers three UAT pause scenarios:
- Automated UAT fails — GSD ran the checks mechanically (
artifact-driven,browser-executable, orruntime-executablemode) and one or more returned FAIL or the overall verdict is PARTIAL - Live/mixed UAT pauses for review — auto-mode dispatches the
run-uatunit but immediately pauses for UAT modes that require live app access or human judgment (live-runtime,mixed,human-experience) - Human-experience UAT — GSD automates what it can, marks taste-based or subjective checks as
NEEDS-HUMAN, and writes a PARTIAL verdict; you review the result and decide whether to accept or fix
Prerequisites
Section titled “Prerequisites”- GSD installed and available in your terminal
- A project running in auto-mode (
/gsd auto) - Understanding of slices and UAT from the auto-mode command reference
The scenario: Cookmate’s recipe search slice (S01) passes all its implementation tasks — search works for normal queries. But UAT reveals that searching for recipe names with special characters (like “Grandma’s Cookies” or “Mac & Cheese”) returns no results. The apostrophe and ampersand break the query.
1. Auto-mode completes all tasks
Section titled “1. Auto-mode completes all tasks”GSD finishes executing T01 (build search API), T02 (build search UI), and T03 (add test coverage). Each task passes its own verification — unit tests pass, the API responds, the UI renders results.
.gsd/└── milestones/ └── M002/ └── slices/ └── S01/ ├── S01-PLAN.md ← all tasks checked off ├── S01-SUMMARY.md ← slice summary written ├── S01-UAT.md ← UAT checks script written by complete-slice └── tasks/ ├── T01-PLAN.md ├── T01-SUMMARY.md ← ✓ search API built ├── T02-PLAN.md ├── T02-SUMMARY.md ← ✓ search UI built ├── T03-PLAN.md └── T03-SUMMARY.md ← ✓ tests written2. Auto-mode runs UAT
Section titled “2. Auto-mode runs UAT”After complete-slice finishes, GSD automatically dispatches a run-uat unit. The runner examines S01-UAT.md to determine the UAT mode, then executes checks accordingly.
Automated modes (artifact-driven, browser-executable, runtime-executable) run checks mechanically and write a result file before auto-mode checks whether to advance. GSD pauses only if the verdict is not PASS.
Live and experience modes (live-runtime, mixed, human-experience) cause auto-mode to pause immediately after dispatch — the unit runs while paused, and you can watch it execute or wait for the result file to appear.
In this scenario, S01-UAT.md is artifact-driven. The runner executes every check with shell commands, grep, file reads, and script invocations, then records the evidence mode, actual result, and a PASS, FAIL, or NEEDS-HUMAN verdict for each check. The overall result is PASS (all checks passed), FAIL (one or more failed), or PARTIAL (some passed, some failed or require human judgment).
The UAT includes a check for special characters in search queries. That check fails:
| Check | Mode | Result | Notes ||------------------------------------|----------|--------|------------------------------------------|| Search returns results for "pasta" | artifact | PASS | 3 results returned || Search handles pagination | artifact | PASS | Page 2 shows next batch || Search for "Grandma's Cookies" | artifact | FAIL | SQL error: unterminated string literal || Search for "Mac & Cheese" | artifact | FAIL | Returns 0 results, expected 1 |GSD writes the result to S01-UAT-RESULT.md with verdict: FAIL.
3. Auto-mode pauses
Section titled “3. Auto-mode pauses”When UAT verdict is FAIL or PARTIAL, auto-mode pauses — it does not automatically replan or retry. You’ll see output like:
● UAT result: FAIL — S01 (Recipe Search) 2 of 4 checks failed. See .gsd/milestones/M002/slices/S01/S01-UAT-RESULT.md
⚠ Auto-mode paused. Investigate the failures and restart with /gsd auto.The S01-UAT-RESULT.md file contains the full structured result:
---sliceId: S01uatType: artifact-drivenverdict: FAILdate: 2025-01-15T14:30:00.000Z---followed by the checks table (with Check | Mode | Result | Notes columns) and a summary of what failed.
A PARTIAL verdict occurs when some checks pass but others are marked NEEDS-HUMAN — for example, in human-experience mode when a check involves taste, visual judgment, or subjective assessment that GSD can’t honestly automate. Auto-mode blocks progression for PARTIAL just as it does for FAIL, since “some checks require human review” is not the same as “the slice is ready to ship.”
4. Investigate the failure
Section titled “4. Investigate the failure”Read the UAT result file to understand exactly which checks failed and why:
> cat .gsd/milestones/M002/slices/S01/S01-UAT-RESULT.mdFor the special-character case, the failure notes make it clear: the search query is interpolated directly into a SQL string without sanitization. Special characters break the query.
5. Fix the issue
Section titled “5. Fix the issue”With the cause identified, fix the underlying code. In this case, add input sanitization — escape apostrophes and ampersands before they reach the database query, and add test coverage for the specific inputs that failed.
You have a few options depending on how significant the fix is:
Option A — Fix it directly. If the fix is straightforward, make the code changes yourself (or in a new /gsd quick task), then proceed to step 6.
Option B — Use /gsd steer. If the fix requires updating the slice plan or task approach, register a steering override before restarting. When auto-mode is not running, steer tells the agent to update plan documents immediately in the current conversation:
/gsd steer sanitize special characters in search queries before database interpolationThis writes the override to OVERRIDES.md and propagates it across active plan documents so the next task dispatch reflects the corrected approach.
Option C — PARTIAL / NEEDS-HUMAN verdict. When the UAT result is PARTIAL, review the NEEDS-HUMAN checks in the result file and perform them manually. If you’re satisfied with the slice, delete the result file and mark the UAT passed by restarting auto-mode (it will re-run UAT, which should now PASS the subjective checks that you’ve verified). Alternatively, skip the UAT unit entirely if you’ve confirmed the slice is complete:
/gsd skip run-uat/M002/S016. Delete the UAT result and restart
Section titled “6. Delete the UAT result and restart”Once the fix is in place, delete the result file so GSD re-runs the UAT check from scratch:
rm .gsd/milestones/M002/slices/S01/S01-UAT-RESULT.mdThen restart auto-mode:
/gsd autoGSD derives state, sees the slice is complete but has no UAT result, and dispatches run-uat again against the fixed code.
7. UAT passes — next slice begins
Section titled “7. UAT passes — next slice begins”This time all checks pass, including the special character queries. GSD writes the updated result with verdict: PASS and advances to the next slice.
.gsd/└── milestones/ └── M002/ └── slices/ └── S01/ ├── S01-PLAN.md ← T01–T03 ✓ ├── S01-SUMMARY.md ├── S01-UAT.md └── S01-UAT-RESULT.md ← verdict: PASSWhat Gets Created
Section titled “What Gets Created”Key artifacts involved in a UAT failure cycle:
| File | Role |
|---|---|
.gsd/milestones/<MID>/slices/<SID>/<SID>-UAT.md | UAT checks script — written by complete-slice, read by run-uat |
.gsd/milestones/<MID>/slices/<SID>/<SID>-UAT-RESULT.md | UAT execution results — written by run-uat with YAML frontmatter and checks table |
.gsd/OVERRIDES.md | Written if you use /gsd steer to guide the fix |
.gsd/completed-units.json | Updated if you use /gsd skip to bypass the UAT unit |
The result file uses YAML frontmatter with four fields: sliceId, uatType, verdict, and date. The dispatcher reads verdict mechanically to decide whether to advance or pause. Only a PASS verdict advances auto-mode to the next slice.
UAT Modes
Section titled “UAT Modes”GSD supports six UAT modes, split by whether auto-mode pauses immediately after dispatch or waits for the verdict:
| Mode | How checks run | Pause behavior |
|---|---|---|
artifact-driven | Shell, grep, file reads, scripts | Pauses only if verdict ≠ PASS |
browser-executable | Browser navigation, screenshots, assertions | Pauses only if verdict ≠ PASS |
runtime-executable | Command or script execution, stdout/stderr | Pauses only if verdict ≠ PASS |
live-runtime | Real app/service, browser/network/runtime checks | Pauses immediately after dispatch |
mixed | All automatable checks + explicit human-only list | Pauses immediately after dispatch |
human-experience | Objective checks automated; taste/visual checks marked NEEDS-HUMAN | Pauses immediately after dispatch |
For human-experience mode, GSD automates setup, preconditions, screenshots, logs, and objective checks — but does not invent subjective PASS results. Taste-based or purely human-judgment checks are marked NEEDS-HUMAN, and the overall verdict is PARTIAL unless every required check was objective and passed.
Flow Diagram
Section titled “Flow Diagram”Related Commands
Section titled “Related Commands”/gsd auto— Restart after resolving the failure/gsd steer— Inject a course-correction before restarting/gsd skip— Bypass the UAT unit entirely if you’ve verified manually/gsd capture— Record a follow-up thought without blocking the pipeline/gsd status— Check current state and the active unit key for skip/steer