Skip to content

Workflow Engine and Quality Systems

Technology workflow visualization representing the workflow engine

The workflow is encoded as a machine-readable directed acyclic graph in workflow-graph.json:

flowchart TD



    S1["step-1: Requirements"]
    G1{{"gate-1: Approval"}}:::gate
    S2["step-2: Architecture"]
    G2{{"gate-2: Approval"}}:::gate
    S3["step-3: Design"]
    S35["step-3.5: Governance"]
    G25{{"gate-2.5: Approval"}}:::gate
    S4B["step-4b: Bicep Plan"]
    S4T["step-4t: TF Plan"]
    G3{{"gate-3: Approval"}}:::gate
    S5B["step-5b: Bicep Code"]
    S5T["step-5t: TF Code"]
    G4{{"gate-4: Validation"}}:::gate
    S6B["step-6b: Bicep Deploy"]
    S6T["step-6t: TF Deploy"]
    G5{{"gate-5: Approval"}}:::gate
    S7["step-7: As-Built"]:::endNode

    S1 --> G1 --> S2 --> G2
    G2 --> S3
    S3 --> S35
    S35 --> G25
    G25 --> S4B & S4T
    S4B & S4T --> G3
    G3 --> S5B & S5T
    S5B & S5T --> G4
    G4 --> S6B & S6T
    S6B & S6T --> G5
    G5 --> S7

Each node has a type (agent-step, gate, subagent-fan-out, validation), and each edge has a condition (on_complete, on_skip, on_fail). Conditional routing at IaC nodes is governed by the decisions.iac_tool field.

Five mandatory gates require explicit human confirmation before the workflow advances:

GateAfterBlocks Until
1Step 1User approves requirements
2Step 2User approves architecture and cost estimate
3Step 4User approves implementation plan
4Step 5Automated validation passes (lint, build, review)
5Step 6User approves deployment and verifies resources

The iac_tool field in 01-requirements.md determines which track is activated. Steps 4b, 5b, 6b form the Bicep track; steps 4t, 5t, 6t form the Terraform track. Only one track is active for a given project.

The 00-session-state.json file (schema v2.0) provides atomic state tracking:

{
"schema_version": "2.0",
"project": "my-project",
"current_step": 2, // (1)!
"lock": {
"owner_id": "copilot-session-abc123", // (2)!
"heartbeat": "2026-03-04T10:15:00Z",
"attempt_token": "550e8400-e29b-41d4-a716-446655440000" // (3)!
},
"steps": {
"2": {
"status": "in_progress",
"sub_step": "phase_2_waf",
"claim": {
"owner_id": "copilot-session-abc123",
"heartbeat": "2026-03-04T10:15:00Z",
"attempt_token": "550e8400-e29b-41d4-a716-446655440000",
"retry_count": 0,
"event_log": []
}
}
}
}
  1. Tracks which step is active — the Conductor uses this for resume
  2. Claim-based locking prevents concurrent sessions from corrupting state
  3. Unique token per attempt — stale heartbeats are auto-recovered

The claim model prevents concurrent sessions from corrupting state. Stale heartbeats (older than stale_threshold_ms, default 5 minutes) are automatically recovered.

At Gates 2 and 3, the Conductor recommends starting a fresh VS Code Copilot Chat session. Long-running sessions (3+ hours) experience forced context summarisations that lose critical decision context. The Session Break Protocol:

  1. Conductor writes current state to 00-session-state.json
  2. Conductor writes 00-handoff.md with human-readable summary
  3. Conductor prints a “SESSION BREAK RECOMMENDED” message
  4. User starts a new chat, invokes Conductor again
  5. Conductor reads 00-session-state.json, finds the next pending step, and resumes

This was driven by real-world observation: the nordic-fresh-foods end-to-end test experienced 5 forced context summarisations in a single 3h39m session.

Every convention is backed by a machine-enforceable check. The validation suite runs via two parallel groups: validate:_node (Node.js validators) and validate:_external (external tool validators):

CategoryValidators
Markdownlint:md, lint:links:docs
Artefact formatlint:artifact-templates, lint:h2-sync, fix:artifact-h2
Agent qualitylint:agent-frontmatter, lint:agent-body-size
Skill qualitylint:skills-format, lint:skill-size, lint:skill-references, lint:orphaned-content
Instruction qualitylint:instruction-frontmatter, validate:instruction-refs
Governancelint:governance-refs, lint:mcp-config
Infrastructurelint:terraform-fmt, validate:terraform
Session statevalidate:session-state, validate:session-lock
Registry/configvalidate:workflow-graph, validate:agent-registry, validate:skill-affinity
Code qualitylint:json, lint:python, lint:yaml
VS Code configvalidate:vscode
Metalint:version-sync, lint:deprecated-refs, lint:docs-freshness, lint:glob-audit

All validators run via npm run validate:all.

Pre-commit (sequential, via lefthook): Validates staged files only — markdown lint, link checks, H2 sync, artefact templates, agent frontmatter, instruction frontmatter, Python lint, Terraform format and validate.

Pre-push (parallel, via lefthook): Diff-based domain routing. The diff-based-push-check.sh script categorises changed files and runs only matching validators:

  • *.bicep → Bicep build + lint
  • *.tf → Terraform fmt + validate
  • *.agent.md → Agent frontmatter + body size
  • *.instructions.md → Instruction frontmatter
  • SKILL.md → Skills format + skill size
  • *.json → JSON syntax
  • *.py → Ruff lint

The circuit breaker pattern protects against runaway agent loops during deployment:

Anomaly PatternDetection ThresholdAction
Error repetition3 consecutiveHalt, write blocked finding
Empty response loop3 consecutiveHalt, escalate to human
Timeout cascade3 consecutiveHalt, check auth
What-if oscillation2 cyclesHalt, flag resource conflict
Auth failure loop2 consecutiveHalt, prompt re-authentication

The context-shredding system defines three compression tiers for artifact loading:

TierTriggerStrategy
full< 60% usedLoad entire artefact
summarized60–80%Key H2 sections only (tables preserved)
minimal> 80%Decision summaries only (< 500 characters)

When the challenger-review-subagent loads predecessor artefacts for review, it is instructed to apply the same 3-tier compression: at the summarized tier, preserving only resource list, SKUs, WAF scores, compliance matrix, and budget sections; at minimal, using only the decisions field from 00-session-state.json plus the resource list. Whether the LLM follows these instructions consistently varies — the compact_for_parent carry-forward between passes is the part that reliably works.

The project uses 3 Copilot hooks (.github/hooks/) that intercept agent actions at runtime:

HookTriggerPurpose
tool-guardianpreToolUseBlocks dangerous commands (destructive ops, force pushes, DB drops)
secrets-scannersessionEndScans modified files for leaked secrets and credentials
session-loggersessionStartLogs session lifecycle and injects project context
governance-audituserPromptSubmittedScans prompts for threat signals with governance levels
post-edit-formatPostToolUseAuto-formats files after agent edits (whitespace, trailing newlines)

Hooks are defined in hooks.json files with type (command), path to shell script, and timeout. They run automatically — agents do not invoke them explicitly.