Skip to content

E2E Testing with Ralph Loop

End-to-end testing for the InfraOps pipeline using the autonomous RALPH Loop pattern.

Ralph Loop is a self-correcting E2E evaluation workflow that runs all InfraOps pipeline steps without human gates. It validates the entire agent pipeline autonomously — from requirements through deployment to documentation — with built-in self-correction, challenger reviews, and benchmark scoring.

Key characteristics:

  • Autonomous: all gates auto-approve after validation passes
  • Self-correcting: validation failures feed findings back to the agent for retry (max 5 per step)
  • IaC-agnostic: supports both Bicep and Terraform tracks
  • Dry-run only: never deploys real Azure resources (uses bicep what-if or terraform plan)
  • Benchmarked: produces an 8-dimension quality score (0–100)
TrackCode DirectoryEntry FileValidation Commands
Bicepinfra/bicep/{project}/main.bicepbicep build, bicep lint
Terraforminfra/terraform/{project}/main.tfterraform validate, terraform fmt -check

The IaC tool is read from decisions.iac_tool in 00-session-state.json. To switch tracks, change the IaC tool field in the prompt file’s Project Context section.

Validates all E2E artifacts for structural compliance (fast, no agent invocation):

Terminal window
# Default project
npm run e2e:validate
# Specific project
node scripts/validate-e2e-step.mjs --project=terraform-e2e all
# Single step
node scripts/validate-e2e-step.mjs --project=e2e-ralph-loop 5

Runs the 8-dimension benchmark and generates a report:

Terminal window
# Default project (e2e-ralph-loop)
npm run e2e:benchmark
# Terraform project
npm run e2e:benchmark -- terraform-e2e
# Multi-project comparison
npm run e2e:benchmark -- --compare

Open VS Code Chat and use one of the prompt files:

  1. Simple project (pre-seeded): Open .github/prompts/e2e-ralph-loop.prompt.md
  2. Complex project (RFP-driven): Open .github/prompts/e2e-contoso-rfp.prompt.md
  3. Post-loop analysis: Open .github/prompts/e2e-analyze-lessons.prompt.md

The E2E Conductor agent (.github/agents/e2e-conductor.agent.md) orchestrates the loop with conditional IaC routing based on session state.

The benchmark engine scores each run across 8 dimensions:

DimensionWeightWhat It Measures
Artifact completeness20%All required step outputs exist
Structural compliance15%Artifact template format, H2 sync, session state
Code quality20%Bicep build/lint or Terraform validate/fmt + AVM usage
Review thoroughness10%Challenger review passes executed per step
WAF coverage10%All 5 Well-Architected pillars in architecture
Cost accuracy5%Budget stated + cost estimate exists
Session state integrity10%Schema version, project, decisions, decision_log, step completion
Timing performance10%Duration within thresholds (3 min normal, 10 min codegen)

Composite score = weighted average. Grades: A (90–100), B (80–89), C (70–79), D (60–69), F (<60).

Pass threshold: 60/100 (configurable via E2E_PASS_THRESHOLD environment variable).

After running npm run e2e:benchmark, check:

  • agent-output/{project}/08-benchmark-report.md — human-readable scorecard
  • agent-output/{project}/08-benchmark-scores.json — machine-readable JSON

Self-correction events and systemic issues are captured in:

  • agent-output/{project}/09-lessons-learned.json — structured findings
  • agent-output/{project}/09-lessons-learned.md — narrative summary

Per-step attempt tracking in agent-output/{project}/08-iteration-log.json.

Cross-agent decisions are captured in the decision_log array inside 00-session-state.json. Each entry records what was decided, why, what was rejected, and which agent made the call. The benchmark scores decision_log presence as part of session state integrity. See .github/instructions/decision-logging.instructions.md for the entry schema.

ProjectIaC ToolComplexityDescription
e2e-ralph-loopBicepSimpleNordic Fresh Foods Lite (canonical)
terraform-e2eTerraformSimpleSmall ecommerce storefront
contoso-service-hub-run-1BicepComplexContoso Service Hub (RFP-driven)
contoso-service-hub-run-2BicepComplexContoso Service Hub (second run)

The IaC tool is detected from 00-session-state.json. Ensure iac_tool is set to either Bicep or Terraform in the session state.

Run terraform init -backend=false in the project directory first. The validator runs this automatically, but network issues may cause failures.

The agent may have written to the wrong output directory. Check that the project name in session state matches the --project flag.