Skip to content

Workflow Engine and Quality Systems

Technology workflow visualization representing the workflow engine

The workflow is encoded as a machine-readable directed acyclic graph in workflow-graph.json:

flowchart TD



    S1["step-1: Requirements"]
    G1{{"gate-1: Approval"}}:::gate
    S2["step-2: Architecture"]
    G2{{"gate-2: Approval"}}:::gate
    S3["step-3: Design"]
    S35["step-3.5: Governance"]
    G25{{"gate-2.5: Approval"}}:::gate
    S4B["step-4b: Bicep Plan"]
    S4T["step-4t: TF Plan"]
    G3{{"gate-3: Approval"}}:::gate
    S5B["step-5b: Bicep Code"]
    S5T["step-5t: TF Code"]
    G4{{"gate-4: Validation"}}:::gate
    S6B["step-6b: Bicep Deploy"]
    S6T["step-6t: TF Deploy"]
    G5{{"gate-5: Approval"}}:::gate
    S7["step-7: As-Built"]:::endNode

    S1 --> G1 --> S2 --> G2
    G2 --> S3
    S3 --> S35
    S35 --> G25
    G25 --> S4B & S4T
    S4B & S4T --> G3
    G3 --> S5B & S5T
    S5B & S5T --> G4
    G4 --> S6B & S6T
    S6B & S6T --> G5
    G5 --> S7

Each node has a type (agent-step, gate, subagent-fan-out, validation), and each edge has a condition (on_complete, on_skip, on_fail). Conditional routing at IaC nodes is governed by the decisions.iac_tool field.

Five mandatory gates require explicit human confirmation before the workflow advances:

GateAfterBlocks Until
1Step 1User approves requirements
2Step 2User approves architecture and cost estimate
3Step 4User approves implementation plan
4Step 5Automated validation passes (lint, build, review)
5Step 6User approves deployment and verifies resources

The iac_tool field in 01-requirements.md determines which track is activated. Steps 4b, 5b, 6b form the Bicep track; steps 4t, 5t, 6t form the Terraform track. Only one track is active for a given project.

The 00-session-state.json file (schema v3.0) provides atomic state tracking:

{
"schema_version": "3.0",
"project": "my-project",
"current_step": 2,
"steps": {
"2": {
"status": "in_progress",
"sub_step": "phase_2_waf",
"started": "2026-03-04T10:05:00Z",
"artifacts": ["agent-output/my-project/02-architecture-assessment.md"]
}
}
}

VS Code Copilot executes agents serially — only one agent runs at a time. The v3.0 schema removed the lock/claim protocol (previously in v2.0) since concurrent agent execution does not occur. Atomic writes (.tmp → rename → .bak) prevent file corruption.

At Gates 2 and 3, the Orchestrator recommends starting a fresh VS Code Copilot Chat session. Long-running sessions (3+ hours) experience forced context summarisations that lose critical decision context. The Session Break Protocol:

  1. Orchestrator writes current state to 00-session-state.json
  2. Orchestrator writes 00-handoff.md with human-readable summary
  3. Orchestrator prints a “SESSION BREAK RECOMMENDED” message
  4. User starts a new chat, invokes Orchestrator again
  5. Orchestrator reads 00-session-state.json, finds the next pending step, and resumes

This was driven by real-world observation: the malta-catering end-to-end test experienced 5 forced context summarisations in a single 3h39m session.

Every convention is backed by a machine-enforceable check. The validation suite runs via two parallel groups: validate:_node (Node.js validators) and validate:_external (external tool validators):

CategoryValidators
Markdownlint:md, lint:links:docs
Artefact formatvalidate:artifacts, lint:artifact-templates, lint:h2-sync
Agent qualityvalidate:agents
Skill qualityvalidate:skills, validate:skill-checks, lint:skill-references, lint:orphaned-content
Instruction qualityvalidate:instruction-checks
Governancelint:governance-refs, lint:mcp-config
Infrastructurelint:terraform-fmt, validate:terraform, validate:iac-security-baseline
Session statevalidate:session-state (also covers deprecated lock/claim field detection)
Registry/configvalidate:workflow-graph, validate:agent-registry
Code qualitylint:json, lint:python, lint:yaml
VS Code configvalidate:vscode
Explorer graphvalidate:explorer-graph
Metalint:version-sync, lint:deprecated-refs, lint:docs-freshness, lint:glob-audit, validate:no-hardcoded-counts, validate:terminology

See reference/validation-reference for the full authoritative list — it is generated from package.json.

All validators run via npm run validate:all.

Pre-commit (sequential, via lefthook): Validates staged files only — markdown lint, link checks, H2 sync, artefact templates, agent frontmatter, instruction frontmatter, Python lint, Terraform format and validate.

Pre-push (parallel, via lefthook): Diff-based domain routing. The diff-based-push-check.sh script categorises changed files and runs only matching validators:

  • *.bicep → Bicep build + lint
  • *.tf → Terraform fmt + validate
  • *.agent.md → Agent frontmatter + body size
  • *.instructions.md → Instruction frontmatter
  • SKILL.md → Skills format + skill size
  • *.json → JSON syntax
  • *.py → Ruff lint

The circuit breaker pattern protects against runaway agent loops during deployment:

Anomaly PatternDetection ThresholdAction
Error repetition3 consecutiveHalt, write blocked finding
Empty response loop3 consecutiveHalt, escalate to human
Timeout cascade3 consecutiveHalt, check auth
What-if oscillation2 cyclesHalt, flag resource conflict
Auth failure loop2 consecutiveHalt, prompt re-authentication

The context-shredding system defines three compression tiers for artifact loading:

TierTriggerStrategy
full< 60% usedLoad entire artefact
summarized60–80%Key H2 sections only (tables preserved)
minimal> 80%Decision summaries only (< 500 characters)

When the challenger-review-subagent loads predecessor artefacts for review, it is instructed to apply the same 3-tier compression: at the summarized tier, preserving only resource list, SKUs, WAF scores, compliance matrix, and budget sections; at minimal, using only the decisions field from 00-session-state.json plus the resource list. Whether the LLM follows these instructions consistently varies — the compact_for_parent carry-forward between passes is the part that reliably works.

Copilot hooks in .github/hooks/ intercept agent actions at runtime. See the Hooks guide for the authoritative list; the current set covers:

HookTriggerPurpose
tool-guardianPreToolUseBlocks dangerous commands (destructive ops, force pushes, DB drops)
secrets-scannerStopScans modified files for leaked secrets and credentials
session-telemetrySessionStart, Stop, UserPromptSubmitMerged session lifecycle logging and governance audit
subagent-validationSubagentStopValidates subagent invocation and outputs
tool-auditPostToolUseLogs tool usage metadata (name, status)

Hooks are defined in hooks.json files with type (command), path to shell script, and timeout. They run automatically — agents do not invoke them explicitly.