CI/CD Pipeline Guide
Overview
Section titled “Overview”GSD 2 uses a three-stage promotion pipeline that automatically moves merged PRs through Dev → Test → Prod environments using npm dist-tags.
PR merged to main │ ▼ ┌─────────┐ ci.yml passes (build, test, typecheck) │ DEV │ → publishes gsd-pi@<version>-dev.<sha> with @dev tag └────┬────┘ ▼ (automatic if green) ┌─────────┐ CLI smoke tests + LLM fixture replay │ TEST │ → promotes to @next tag └────┬────┘ → pushes Docker image as :next ▼ (manual approval required) ┌─────────┐ optional real-LLM integration tests │ PROD │ → promotes to @latest tag └─────────┘ → creates GitHub ReleaseFor Contributors: Testing Your PR Before It Ships
Section titled “For Contributors: Testing Your PR Before It Ships”Install the Dev Build
Section titled “Install the Dev Build”Every merged PR is immediately installable:
# Latest dev build (bleeding edge, every merged PR)npx gsd-pi@dev
# Test candidate (passed smoke + fixture tests)npx gsd-pi@next
# Stable production releasenpx gsd-pi@latest # or just: npx gsd-piUsing Docker
Section titled “Using Docker”# Test candidatedocker run --rm -v $(pwd):/workspace ghcr.io/gsd-build/gsd-pi:next --version
# Stabledocker run --rm -v $(pwd):/workspace ghcr.io/gsd-build/gsd-pi:latest --versionChecking if a Fix Landed
Section titled “Checking if a Fix Landed”- Find the PR’s merge commit SHA (first 7 chars)
- Check if it’s in
@dev:npm view gsd-pi@dev version- If the version ends in
-dev.<your-sha>, your PR is in dev
- If the version ends in
- Check if it promoted to
@next:npm view gsd-pi@next version - Check if it’s in production:
npm view gsd-pi@latest version
For Maintainers
Section titled “For Maintainers”Pipeline Workflows
Section titled “Pipeline Workflows”| Workflow | File | Trigger | Purpose |
|---|---|---|---|
| CI | ci.yml | PR + push to main | Build, test, typecheck — gate for all promotions |
| Release Pipeline | pipeline.yml | After CI succeeds on main | Three-stage promotion |
| Native Binaries | build-native.yml | v* tags | Cross-compile platform binaries |
| Dev Cleanup | cleanup-dev-versions.yml | Weekly (Monday 06:00 UTC) | Unpublish -dev. versions older than 30 days |
| AI Triage | triage.yml | New issues + PRs | Automated classification via Claude Haiku (v2.36) |
CI optimization (v2.38): GitHub Actions minutes were reduced ~60-70% (~10k → ~3-4k/month) through workflow consolidation and caching improvements.
Pipeline optimization (v2.41):
- Shallow clones — CI lint and build jobs use
fetch-depth: 1orfetch-depth: 2instead of full history, saving ~30-60s per job - npm cache in pipeline — dev-publish, test-verify, and prod-release now use
cache: 'npm'on setup-node, saving ~1-2 min per job on repeat runs - Exponential backoff — npm registry propagation waits in
build-native.ymlreplaced hardcodedsleep 30+ fixed 15s retries with exponential backoff (5s → 10s → 20s → 30s cap), typically finishing in <15s when the registry is fast - Security hardening — pipeline.yml moved
${{ }}expressions fromrun:blocks toenv:variables to prevent command injection vectors
Docs-Only PR Detection (v2.41)
Section titled “Docs-Only PR Detection (v2.41)”CI automatically detects when a PR contains only documentation changes (.md files and docs/ content). When docs-only:
- Skipped:
build,windows-portability(no code to compile or test) - Still runs:
lint(secret scanning,.gsd/check),docs-check(prompt injection scan)
This saves CI minutes on documentation PRs while still enforcing security checks.
Prompt Injection Scan (v2.41)
Section titled “Prompt Injection Scan (v2.41)”The docs-check job runs scripts/docs-prompt-injection-scan.sh on every PR that touches markdown files. It scans documentation prose (excluding fenced code blocks) for patterns that could manipulate LLM behavior when docs are ingested as context:
- System prompt markers —
<system-prompt>,<|im_start|>system,[SYSTEM]: - Role/instruction overrides —
ignore previous instructions,you are now,new instructions: - Hidden HTML directives —
<!-- PROMPT:,<!-- INSTRUCTION: - Tool call injection —
<tool_call>,<function_call>,<invoke - Invisible Unicode — zero-width character sequences that hide directives
Content inside fenced code blocks (```) is excluded — patterns in code examples are expected and legitimate.
False positives: Add exceptions to .prompt-injection-scanignore using the same format as .secretscanignore (one pattern per line, file:regex for file-scoped exceptions).
Gating Tests
Section titled “Gating Tests”The pipeline only triggers after ci.yml passes. Key gating tests include:
- Unit tests (
npm run test:unit) — includesauto-session-encapsulation.test.tswhich enforces that all auto-mode state is encapsulated inAutoSession, plus dispatch loop regression tests that exercise the fullderiveState → resolveDispatch → idempotencychain without an LLM. Any PR adding module-level mutable state toauto.tswill fail CI and block the pipeline. - Integration tests (
npm run test:integration) - Extension typecheck (
npm run typecheck:extensions) - Package validation (
npm run validate-pack) - Smoke tests (
npm run test:smoke) — run post-build in the pipeline against the local binary and again against the globally-installed@devpackage - Fixture tests (
npm run test:fixtures) — replay recorded LLM conversations without hitting real APIs - Live regression tests (
npm run test:live-regression) — run against the installed binary in the Test stage to catch runtime regressions before promotion to@next
Approving a Prod Release
Section titled “Approving a Prod Release”- A version reaches the Test stage automatically
- In GitHub Actions, the
prod-releasejob will show “Waiting for review” - Click Review deployments → select
prod→ Approve - The version is promoted to
@latestand a GitHub Release is created
To enable live LLM tests during Prod promotion:
- Set the
RUN_LIVE_TESTSenvironment variable totrueon theprodenvironment
Rolling Back a Release
Section titled “Rolling Back a Release”If a broken version reaches production:
# Roll back npmnpm dist-tag add gsd-pi@<previous-good-version> latest
# Roll back Dockerdocker pull ghcr.io/gsd-build/gsd-pi:<previous-good-version>docker tag ghcr.io/gsd-build/gsd-pi:<previous-good-version> ghcr.io/gsd-build/gsd-pi:latestdocker push ghcr.io/gsd-build/gsd-pi:latestFor @dev or @next rollbacks, the next successful merge will overwrite the tag automatically.
GitHub Configuration Required
Section titled “GitHub Configuration Required”| Setting | Value |
|---|---|
Environment: dev | No protection rules |
Environment: test | No protection rules |
Environment: prod | Required reviewers: maintainers |
Secret: NPM_TOKEN | All environments |
Secret: ANTHROPIC_API_KEY | Prod environment only |
Secret: OPENAI_API_KEY | Prod environment only |
Variable: RUN_LIVE_TESTS | false (set to true to enable live LLM tests) |
| GHCR | Enabled for the gsd-build org |
Docker Images
Section titled “Docker Images”| Image | Base | Purpose | Tags |
|---|---|---|---|
ghcr.io/gsd-build/gsd-ci-builder | node:24-bookworm | CI build environment with Rust toolchain | :latest, :<date> |
ghcr.io/gsd-build/gsd-pi | node:24-slim | User-facing runtime | :latest, :next, :v<version> |
The CI builder image is rebuilt automatically when the Dockerfile changes. It eliminates ~3-5 min of toolchain setup per CI run.
LLM Fixture Tests
Section titled “LLM Fixture Tests”The fixture system records and replays LLM conversations without hitting real APIs (zero cost).
Running Fixture Tests
Section titled “Running Fixture Tests”npm run test:fixturesRecording New Fixtures
Section titled “Recording New Fixtures”# Set your API key, then recordGSD_FIXTURE_MODE=record GSD_FIXTURE_DIR=./tests/fixtures/recordings \ node --experimental-strip-types tests/fixtures/record.tsFixtures are JSON files in tests/fixtures/recordings/. Each one captures a conversation’s request/response pairs and replays them by turn index.
When to Re-Record
Section titled “When to Re-Record”Re-record fixtures when:
- Provider wire format changes (e.g., new field in Anthropic response)
- Tool definitions change (affects request shape)
- System prompt changes (may cause turn count mismatch)
Version Strategy
Section titled “Version Strategy”| Tag | Published | Format | Who uses it |
|---|---|---|---|
@dev | Every merged PR | 2.27.0-dev.a3f2c1b | Developers verifying fixes |
@next | Auto-promoted from dev | Same version | Early adopters, beta testers |
@latest | Manually approved | Same version | Production users |
Old -dev. versions are cleaned up weekly (30-day retention).