Skip to content

Recipe: Handle UAT Failures

You’re running GSD auto-mode and a slice completes all its tasks, but the UAT (User Acceptance Test) step fails or surfaces checks requiring human follow-up. Auto-mode pauses and shows the failure details — it doesn’t automatically replan or retry. This recipe walks through how to read the failure, understand your options, and get the pipeline moving again.

This covers three UAT pause scenarios:

  • Automated UAT fails — GSD ran the checks mechanically (artifact-driven, browser-executable, or runtime-executable mode) and one or more returned FAIL or the overall verdict is PARTIAL
  • Live/mixed UAT pauses for review — auto-mode dispatches the run-uat unit but immediately pauses for UAT modes that require live app access or human judgment (live-runtime, mixed, human-experience)
  • Human-experience UAT — GSD automates what it can, marks taste-based or subjective checks as NEEDS-HUMAN, and writes a PARTIAL verdict; you review the result and decide whether to accept or fix

The scenario: Cookmate’s recipe search slice (S01) passes all its implementation tasks — search works for normal queries. But UAT reveals that searching for recipe names with special characters (like “Grandma’s Cookies” or “Mac & Cheese”) returns no results. The apostrophe and ampersand break the query.

GSD finishes executing T01 (build search API), T02 (build search UI), and T03 (add test coverage). Each task passes its own verification — unit tests pass, the API responds, the UI renders results.

.gsd/
└── milestones/
└── M002/
└── slices/
└── S01/
├── S01-PLAN.md ← all tasks checked off
├── S01-SUMMARY.md ← slice summary written
├── S01-UAT.md ← UAT checks script written by complete-slice
└── tasks/
├── T01-PLAN.md
├── T01-SUMMARY.md ← ✓ search API built
├── T02-PLAN.md
├── T02-SUMMARY.md ← ✓ search UI built
├── T03-PLAN.md
└── T03-SUMMARY.md ← ✓ tests written

After complete-slice finishes, GSD automatically dispatches a run-uat unit. The runner examines S01-UAT.md to determine the UAT mode, then executes checks accordingly.

Automated modes (artifact-driven, browser-executable, runtime-executable) run checks mechanically and write a result file before auto-mode checks whether to advance. GSD pauses only if the verdict is not PASS.

Live and experience modes (live-runtime, mixed, human-experience) cause auto-mode to pause immediately after dispatch — the unit runs while paused, and you can watch it execute or wait for the result file to appear.

In this scenario, S01-UAT.md is artifact-driven. The runner executes every check with shell commands, grep, file reads, and script invocations, then records the evidence mode, actual result, and a PASS, FAIL, or NEEDS-HUMAN verdict for each check. The overall result is PASS (all checks passed), FAIL (one or more failed), or PARTIAL (some passed, some failed or require human judgment).

The UAT includes a check for special characters in search queries. That check fails:

| Check | Mode | Result | Notes |
|------------------------------------|----------|--------|------------------------------------------|
| Search returns results for "pasta" | artifact | PASS | 3 results returned |
| Search handles pagination | artifact | PASS | Page 2 shows next batch |
| Search for "Grandma's Cookies" | artifact | FAIL | SQL error: unterminated string literal |
| Search for "Mac & Cheese" | artifact | FAIL | Returns 0 results, expected 1 |

GSD writes the result to S01-UAT-RESULT.md with verdict: FAIL.

When UAT verdict is FAIL or PARTIAL, auto-mode pauses — it does not automatically replan or retry. You’ll see output like:

● UAT result: FAIL — S01 (Recipe Search)
2 of 4 checks failed. See .gsd/milestones/M002/slices/S01/S01-UAT-RESULT.md
⚠ Auto-mode paused. Investigate the failures and restart with /gsd auto.

The S01-UAT-RESULT.md file contains the full structured result:

---
sliceId: S01
uatType: artifact-driven
verdict: FAIL
date: 2025-01-15T14:30:00.000Z
---

followed by the checks table (with Check | Mode | Result | Notes columns) and a summary of what failed.

A PARTIAL verdict occurs when some checks pass but others are marked NEEDS-HUMAN — for example, in human-experience mode when a check involves taste, visual judgment, or subjective assessment that GSD can’t honestly automate. Auto-mode blocks progression for PARTIAL just as it does for FAIL, since “some checks require human review” is not the same as “the slice is ready to ship.”

Read the UAT result file to understand exactly which checks failed and why:

> cat .gsd/milestones/M002/slices/S01/S01-UAT-RESULT.md

For the special-character case, the failure notes make it clear: the search query is interpolated directly into a SQL string without sanitization. Special characters break the query.

With the cause identified, fix the underlying code. In this case, add input sanitization — escape apostrophes and ampersands before they reach the database query, and add test coverage for the specific inputs that failed.

You have a few options depending on how significant the fix is:

Option A — Fix it directly. If the fix is straightforward, make the code changes yourself (or in a new /gsd quick task), then proceed to step 6.

Option B — Use /gsd steer. If the fix requires updating the slice plan or task approach, register a steering override before restarting. When auto-mode is not running, steer tells the agent to update plan documents immediately in the current conversation:

/gsd steer sanitize special characters in search queries before database interpolation

This writes the override to OVERRIDES.md and propagates it across active plan documents so the next task dispatch reflects the corrected approach.

Option C — PARTIAL / NEEDS-HUMAN verdict. When the UAT result is PARTIAL, review the NEEDS-HUMAN checks in the result file and perform them manually. If you’re satisfied with the slice, delete the result file and mark the UAT passed by restarting auto-mode (it will re-run UAT, which should now PASS the subjective checks that you’ve verified). Alternatively, skip the UAT unit entirely if you’ve confirmed the slice is complete:

/gsd skip run-uat/M002/S01

Once the fix is in place, delete the result file so GSD re-runs the UAT check from scratch:

Terminal window
rm .gsd/milestones/M002/slices/S01/S01-UAT-RESULT.md

Then restart auto-mode:

/gsd auto

GSD derives state, sees the slice is complete but has no UAT result, and dispatches run-uat again against the fixed code.

This time all checks pass, including the special character queries. GSD writes the updated result with verdict: PASS and advances to the next slice.

.gsd/
└── milestones/
└── M002/
└── slices/
└── S01/
├── S01-PLAN.md ← T01–T03 ✓
├── S01-SUMMARY.md
├── S01-UAT.md
└── S01-UAT-RESULT.md ← verdict: PASS

Key artifacts involved in a UAT failure cycle:

FileRole
.gsd/milestones/<MID>/slices/<SID>/<SID>-UAT.mdUAT checks script — written by complete-slice, read by run-uat
.gsd/milestones/<MID>/slices/<SID>/<SID>-UAT-RESULT.mdUAT execution results — written by run-uat with YAML frontmatter and checks table
.gsd/OVERRIDES.mdWritten if you use /gsd steer to guide the fix
.gsd/completed-units.jsonUpdated if you use /gsd skip to bypass the UAT unit

The result file uses YAML frontmatter with four fields: sliceId, uatType, verdict, and date. The dispatcher reads verdict mechanically to decide whether to advance or pause. Only a PASS verdict advances auto-mode to the next slice.

GSD supports six UAT modes, split by whether auto-mode pauses immediately after dispatch or waits for the verdict:

ModeHow checks runPause behavior
artifact-drivenShell, grep, file reads, scriptsPauses only if verdict ≠ PASS
browser-executableBrowser navigation, screenshots, assertionsPauses only if verdict ≠ PASS
runtime-executableCommand or script execution, stdout/stderrPauses only if verdict ≠ PASS
live-runtimeReal app/service, browser/network/runtime checksPauses immediately after dispatch
mixedAll automatable checks + explicit human-only listPauses immediately after dispatch
human-experienceObjective checks automated; taste/visual checks marked NEEDS-HUMANPauses immediately after dispatch

For human-experience mode, GSD automates setup, preconditions, screenshots, logs, and objective checks — but does not invent subjective PASS results. Taste-based or purely human-judgment checks are marked NEEDS-HUMAN, and the overall verdict is PARTIAL unless every required check was objective and passed.

  • /gsd auto — Restart after resolving the failure
  • /gsd steer — Inject a course-correction before restarting
  • /gsd skip — Bypass the UAT unit entirely if you’ve verified manually
  • /gsd capture — Record a follow-up thought without blocking the pipeline
  • /gsd status — Check current state and the active unit key for skip/steer