Recipe: Handle UAT Failures

When to Use This

You’re running GSD auto-mode and a slice completes all its tasks, but the UAT (User Acceptance Test) step fails or surfaces checks requiring human follow-up. Auto-mode pauses and shows the failure details — it doesn’t automatically replan or retry. This recipe walks through how to read the failure, understand your options, and get the pipeline moving again.

This covers three UAT pause scenarios:

Automated UAT fails — GSD ran the checks mechanically (artifact-driven, browser-executable, or runtime-executable mode) and one or more returned FAIL or the overall verdict is PARTIAL
Live/mixed UAT pauses for review — auto-mode dispatches the run-uat unit but immediately pauses for UAT modes that require live app access or human judgment (live-runtime, mixed, human-experience)
Human-experience UAT — GSD automates what it can, marks taste-based or subjective checks as NEEDS-HUMAN, and writes a PARTIAL verdict; you review the result and decide whether to accept or fix

Prerequisites

GSD installed and available in your terminal
A project running in auto-mode (/gsd auto)
Understanding of slices and UAT from the auto-mode command reference

Steps

The scenario: Cookmate’s recipe search slice (S01) passes all its implementation tasks — search works for normal queries. But UAT reveals that searching for recipe names with special characters (like “Grandma’s Cookies” or “Mac & Cheese”) returns no results. The apostrophe and ampersand break the query.

1. Auto-mode completes all tasks

GSD finishes executing T01 (build search API), T02 (build search UI), and T03 (add test coverage). Each task passes its own verification — unit tests pass, the API responds, the UI renders results.

.gsd/
└── milestones/
    └── M002/
        └── slices/
            └── S01/
                ├── S01-PLAN.md           ← all tasks checked off
                ├── S01-SUMMARY.md        ← slice summary written
                ├── S01-UAT.md            ← UAT checks script written by complete-slice
                └── tasks/
                    ├── T01-PLAN.md
                    ├── T01-SUMMARY.md     ← ✓ search API built
                    ├── T02-PLAN.md
                    ├── T02-SUMMARY.md     ← ✓ search UI built
                    ├── T03-PLAN.md
                    └── T03-SUMMARY.md     ← ✓ tests written

2. Auto-mode runs UAT

After complete-slice finishes, GSD automatically dispatches a run-uat unit. The runner examines S01-UAT.md to determine the UAT mode, then executes checks accordingly.

Automated modes (artifact-driven, browser-executable, runtime-executable) run checks mechanically and write a result file before auto-mode checks whether to advance. GSD pauses only if the verdict is not PASS.

Live and experience modes (live-runtime, mixed, human-experience) cause auto-mode to pause immediately after dispatch — the unit runs while paused, and you can watch it execute or wait for the result file to appear.

In this scenario, S01-UAT.md is artifact-driven. The runner executes every check with shell commands, grep, file reads, and script invocations, then records the evidence mode, actual result, and a PASS, FAIL, or NEEDS-HUMAN verdict for each check. The overall result is PASS (all checks passed), FAIL (one or more failed), or PARTIAL (some passed, some failed or require human judgment).

The UAT includes a check for special characters in search queries. That check fails:

| Check                              | Mode     | Result | Notes                                    |
|------------------------------------|----------|--------|------------------------------------------|
| Search returns results for "pasta" | artifact | PASS   | 3 results returned                       |
| Search handles pagination          | artifact | PASS   | Page 2 shows next batch                  |
| Search for "Grandma's Cookies"     | artifact | FAIL   | SQL error: unterminated string literal   |
| Search for "Mac & Cheese"          | artifact | FAIL   | Returns 0 results, expected 1            |

GSD writes the result to S01-UAT-RESULT.md with verdict: FAIL.

3. Auto-mode pauses

When UAT verdict is FAIL or PARTIAL, auto-mode pauses — it does not automatically replan or retry. You’ll see output like:

● UAT result: FAIL — S01 (Recipe Search)
  2 of 4 checks failed. See .gsd/milestones/M002/slices/S01/S01-UAT-RESULT.md

⚠ Auto-mode paused. Investigate the failures and restart with /gsd auto.

The S01-UAT-RESULT.md file contains the full structured result:

---
sliceId: S01
uatType: artifact-driven
verdict: FAIL
date: 2025-01-15T14:30:00.000Z
---

followed by the checks table (with Check | Mode | Result | Notes columns) and a summary of what failed.

A PARTIAL verdict occurs when some checks pass but others are marked NEEDS-HUMAN — for example, in human-experience mode when a check involves taste, visual judgment, or subjective assessment that GSD can’t honestly automate. Auto-mode blocks progression for PARTIAL just as it does for FAIL, since “some checks require human review” is not the same as “the slice is ready to ship.”

4. Investigate the failure

Read the UAT result file to understand exactly which checks failed and why:

> cat .gsd/milestones/M002/slices/S01/S01-UAT-RESULT.md

For the special-character case, the failure notes make it clear: the search query is interpolated directly into a SQL string without sanitization. Special characters break the query.

5. Fix the issue

With the cause identified, fix the underlying code. In this case, add input sanitization — escape apostrophes and ampersands before they reach the database query, and add test coverage for the specific inputs that failed.

You have a few options depending on how significant the fix is:

Option A — Fix it directly. If the fix is straightforward, make the code changes yourself (or in a new /gsd quick task), then proceed to step 6.

Option B — Use /gsd steer. If the fix requires updating the slice plan or task approach, register a steering override before restarting. When auto-mode is not running, steer tells the agent to update plan documents immediately in the current conversation:

/gsd steer sanitize special characters in search queries before database interpolation

This writes the override to OVERRIDES.md and propagates it across active plan documents so the next task dispatch reflects the corrected approach.

Option C — PARTIAL / NEEDS-HUMAN verdict. When the UAT result is PARTIAL, review the NEEDS-HUMAN checks in the result file and perform them manually. If you’re satisfied with the slice, delete the result file and mark the UAT passed by restarting auto-mode (it will re-run UAT, which should now PASS the subjective checks that you’ve verified). Alternatively, skip the UAT unit entirely if you’ve confirmed the slice is complete:

/gsd skip run-uat/M002/S01

6. Delete the UAT result and restart

Once the fix is in place, delete the result file so GSD re-runs the UAT check from scratch:

rm .gsd/milestones/M002/slices/S01/S01-UAT-RESULT.md

Then restart auto-mode:

/gsd auto

GSD derives state, sees the slice is complete but has no UAT result, and dispatches run-uat again against the fixed code.

7. UAT passes — next slice begins

This time all checks pass, including the special character queries. GSD writes the updated result with verdict: PASS and advances to the next slice.

.gsd/
└── milestones/
    └── M002/
        └── slices/
            └── S01/
                ├── S01-PLAN.md            ← T01–T03 ✓
                ├── S01-SUMMARY.md
                ├── S01-UAT.md
                └── S01-UAT-RESULT.md      ← verdict: PASS

What Gets Created

Key artifacts involved in a UAT failure cycle:

File	Role
`.gsd/milestones/<MID>/slices/<SID>/<SID>-UAT.md`	UAT checks script — written by `complete-slice`, read by `run-uat`
`.gsd/milestones/<MID>/slices/<SID>/<SID>-UAT-RESULT.md`	UAT execution results — written by `run-uat` with YAML frontmatter and checks table
`.gsd/OVERRIDES.md`	Written if you use `/gsd steer` to guide the fix
`.gsd/completed-units.json`	Updated if you use `/gsd skip` to bypass the UAT unit

The result file uses YAML frontmatter with four fields: sliceId, uatType, verdict, and date. The dispatcher reads verdict mechanically to decide whether to advance or pause. Only a PASS verdict advances auto-mode to the next slice.

UAT Modes

GSD supports six UAT modes, split by whether auto-mode pauses immediately after dispatch or waits for the verdict:

Mode	How checks run	Pause behavior
`artifact-driven`	Shell, grep, file reads, scripts	Pauses only if verdict ≠ PASS
`browser-executable`	Browser navigation, screenshots, assertions	Pauses only if verdict ≠ PASS
`runtime-executable`	Command or script execution, stdout/stderr	Pauses only if verdict ≠ PASS
`live-runtime`	Real app/service, browser/network/runtime checks	Pauses immediately after dispatch
`mixed`	All automatable checks + explicit human-only list	Pauses immediately after dispatch
`human-experience`	Objective checks automated; taste/visual checks marked `NEEDS-HUMAN`	Pauses immediately after dispatch

For human-experience mode, GSD automates setup, preconditions, screenshots, logs, and objective checks — but does not invent subjective PASS results. Taste-based or purely human-judgment checks are marked NEEDS-HUMAN, and the overall verdict is PARTIAL unless every required check was objective and passed.

Flow Diagram

/gsd auto — Restart after resolving the failure
/gsd steer — Inject a course-correction before restarting
/gsd skip — Bypass the UAT unit entirely if you’ve verified manually
/gsd capture — Record a follow-up thought without blocking the pipeline
/gsd status — Check current state and the active unit key for skip/steer

Previous
New Milestone Next
Error Recovery