How Agentic InfraOps Works¶
A comprehensive guide to the multi-agent orchestration system for Azure infrastructure development.
Executive Summary¶
Agentic InfraOps is a multi-agent orchestration system where specialised AI agents collaborate through a structured 8-step workflow to transform Azure infrastructure requirements into deployed, production-grade Infrastructure as Code. The system coordinates 16 top-level agents and 11 subagents through mandatory human approval gates, producing Bicep or Terraform templates that conform to Azure Well-Architected Framework principles, Azure Verified Modules standards, and organisational governance policies. The agents are supported by reusable skills, instruction files, Copilot hooks, and MCP server integrations.
The core thesis is that AI agents can reliably produce production-grade Azure infrastructure when properly orchestrated with guardrails. The system achieves this through a layered knowledge architecture (agents, skills, instructions, registries), mechanical enforcement of invariants via automated validation scripts, and a human-in-the-loop design that preserves operator control at every critical decision point. Cost governance (budget alerts, forecast notifications, anomaly detection) and template repeatability (zero hardcoded values) are enforced as first-class concerns across all generated infrastructure.
- System Architecture
The 8-step workflow, Conductor pattern, and dual IaC tracks (Bicep & Terraform).
- Core Concepts
Agents, Skills, Instructions, and Configuration Registries — the knowledge layers.
- Agent Architecture
16 top-level agents, 11 subagents, the Challenger pattern, and handoff design.
- Skills & Instructions
Progressive skill loading, glob-based instruction enforcement, and the skill catalog.
- Workflow Engine & Quality
DAG model, approval gates, session state, validators, Copilot hooks, and circuit breakers.
- MCP Integration
Three MCP servers plus one VS Code extension: GitHub, Azure Pricing, Terraform Registry, and Azure MCP (extension).
Intellectual Foundations¶
This project draws directly from two bodies of work that define how autonomous AI agents can operate reliably in professional software engineering contexts.
Harness Engineering (OpenAI)¶
In February 2026, OpenAI published "Harness Engineering: Leveraging Codex in an Agent-First World," describing how a small team built and shipped an internal product with zero lines of manually-written code. Every line — application logic, tests, CI configuration, documentation, and internal tooling — was generated by Codex agents. The key insights that shaped this project:
Repository as the system of record. Knowledge that lives in Google Docs, chat threads,
or people's heads is invisible to agents. Only versioned, in-repo artifacts — code, markdown,
schemas, execution plans — exist from the agent's perspective. This project implements this
principle by storing all agent outputs in agent-output/{project}/, all conventions in skills
and instructions, and all decisions in Architecture Decision Records.
Map, not manual. OpenAI initially tried a monolithic AGENTS.md approach and found it
failed: context is a scarce resource, and a giant instruction file crowds out the task.
Instead, they treat AGENTS.md as a table of contents that points to deeper sources.
This project adopts the same pattern: AGENTS.md is approximately 250 lines and points to
skills, instruction files, and multiple configuration registries.
Enforce invariants, not implementations. Rather than prescribing step-by-step procedures, the Harness Engineering approach encodes strict boundaries (architectural layering rules, naming conventions, security requirements) and lets agents choose their own path within those constraints. This project enforces invariants mechanically: validation scripts check naming conventions, template compliance, governance references, and architectural rules.
Human taste gets encoded. When a human reviewer catches a pattern issue, the fix is not to patch the output — it is to update the instruction or skill that should have prevented the issue. Over time, human judgment compounds in the system as linter rules, templates, and skill updates.
Garbage collection through continuous enforcement. Technical debt in an agent-generated system accumulates the same way it does in human-generated systems, but faster. The Harness Engineering approach runs recurring agents that scan for deviations and open targeted refactoring pull requests. This project implements a quarterly context audit checklist and weekly documentation freshness checks.
Bosun (VirtEngine)¶
Bosun is an open-source, production-grade control plane for autonomous software engineering. Originally named OpenFleet, Bosun routes work across multiple AI executors (Codex, Copilot, Claude, OpenCode), automates retries and failover, manages PR lifecycles, and provides operator control through a Telegram Mini App dashboard. Key concepts adopted from Bosun:
What are .mjs files?
Files ending in .mjs are ECMAScript Modules — JavaScript files that use the
modern import/export syntax (as opposed to .cjs which uses require()).
Bosun's codebase is written as Node.js ESM modules. References like
shared-state-manager.mjs point to specific source files in the Bosun repository.
Distributed shared state with claim-based locking. Bosun's shared-state-manager.mjs
implements heartbeat-based liveness detection and claim tokens to prevent concurrent agents
from double-writing the same task. This project's session state schema v2.0 directly
adapts this pattern with lock.owner_id, lock.heartbeat, lock.attempt_token, and
per-step claim objects.
Workflow engine as a DAG. Bosun's workflow-engine.mjs and workflow-nodes.mjs
define workflow execution as a directed acyclic graph with typed nodes, conditional edges,
and fan-out patterns. This project's workflow-graph.json encodes the 8-step pipeline
as a machine-readable DAG with agent-step, gate, subagent-fan-out, and validation
node types.
What is a DAG?
A DAG (Directed Acyclic Graph) is a graph where edges have a direction and there are no cycles — meaning you can never follow the arrows back to where you started. In workflow engines, a DAG models task dependencies: each step points to the steps that must come after it, guaranteeing a clear execution order with no infinite loops.
Context shredding. Bosun's context-shredding-config.mjs implements tiered context
compression to manage token budgets across long-running sessions. This project's
context-shredding skill defines three compression tiers (full, summarized, minimal)
with per-artifact compression templates.
Circuit breaker and anomaly detection. Bosun's anomaly-detector.mjs and
error-detector.mjs detect stalled loops and repeated failures, triggering escalation.
This project's circuit breaker pattern (in iac-common/references/circuit-breaker.md)
defines a failure taxonomy, detection thresholds, and mandatory stopping rules.
Smart PR lifecycle. Bosun auto-labels PRs with bosun-needs-fix when CI fails
and merges passing PRs through a watchdog with a mandatory review gate. This project's
Smart PR Flow adapts the same pattern with infraops-ci-pass / infraops-needs-fix labels
and deploy agent integration.
Diff-based pre-push hooks. Bosun's .githooks/ directory implements targeted
validation that only runs checks for changed file domains. This project's pre-push hook
in lefthook.yml categorises changed files and runs only matching validators in parallel.
Built-in tools catalog with skill affinity. Bosun's agent-custom-tools.mjs includes
a BUILTIN_TOOLS catalog that maps which tools and skills each agent profile needs.
This project's skill-affinity.json maps each agent to skills with affinity weights
(primary, secondary, never).
Agent prompts registry. Bosun's agent-prompts.mjs provides a machine-readable
registry of agent configurations. This project's agent-registry.json maps each agent
role to its definition file, default model, and required skills.
Ralph (Snarktank)¶
Ralph is an autonomous AI agent loop (12k+ GitHub stars) based on Geoffrey Huntley's Ralph pattern. It spawns fresh AI coding tool instances (Amp or Claude Code) in a bash loop, picking off PRD user stories one at a time until all items pass. Key concepts adopted from Ralph:
Fresh-context iteration model. Each Ralph iteration spawns a brand-new
AI instance with zero carry-over context. The only memory between iterations
is git history, a progress.txt append-only learning log, and a prd.json
task list. This project adopts the same philosophy through its session-resume
skill: each agent step is stateless, and all memory persists through versioned
artefact files in agent-output/{project}/ and the machine-readable
00-session-state.json.
Right-sized task decomposition. Ralph insists that each PRD item must be small enough to complete within a single context window — "Add a database column" not "Build the entire dashboard." This project enforces the same principle at a different scale: each of the 8 workflow steps is scoped to a single well-defined output (one requirements doc, one architecture assessment, one implementation plan), and subagents are further decomposed to atomic validation or review tasks.
AGENTS.md as compounding knowledge. Ralph treats AGENTS.md updates as
critical: after each iteration the AI appends discovered patterns, gotchas,
and conventions so that future iterations (and human developers) benefit.
This project elevates the same pattern to a first-class system: AGENTS.md
is the table of contents, skills contain deep domain knowledge, and
instructions encode discovered conventions as enforceable rules. Golden
Principle 7 — "Human Taste Gets Encoded" — directly mirrors Ralph's
append-only learning loop.
Feedback loops as mandatory infrastructure. Ralph only works when typecheck catches errors, tests verify behaviour, and CI stays green — otherwise broken code compounds across iterations. This project's 28 validation scripts, pre-commit/pre-push hooks, and circuit breaker pattern serve the identical function: mechanical feedback loops that prevent error propagation across agent steps.
Deterministic stop conditions. Ralph exits when all user stories have
passes: true. This project's workflow engine defines explicit gate
conditions: each step transition requires either human approval or automated
validation pass, and the Conductor agent tracks completion state in the
session state file.
How This Project Synthesises All Three¶
Harness Engineering provides the philosophy: treat the repository as the single source of truth, encode human taste into mechanical rules, enforce invariants rather than implementations, and manage context as a scarce resource.
Bosun provides the engineering patterns: distributed state with claims, DAG-based workflow execution, complexity routing, context compression, circuit breakers, and PR automation.
Ralph provides the execution model: stateless iteration loops, right-sized task decomposition, append-only learning, mandatory feedback loops, and deterministic stop conditions.
This project weaves all three into a system purpose-built for Azure infrastructure:
| Concern | Harness Engineering Principle | Bosun Pattern | Ralph Pattern | This Project |
|---|---|---|---|---|
| Knowledge management | Repo is system of record | Shared knowledge base | AGENTS.md + progress.txt |
Skills + instructions + agent-output/ |
| Context management | Map, not manual | Context shredding | Fresh context per iteration | Progressive skill loading + 3-tier compression |
| Quality enforcement | Mechanical enforcement of invariants | Pre-push hooks + anomaly detection | Mandatory CI feedback loops | Validators + pre-commit/push hooks + Copilot hooks |
| Workflow orchestration | Structured step progression | Workflow engine DAG | Bash loop + prd.json task list |
workflow-graph.json + Conductor agent |
| Concurrency safety | — | Claim-based locking | Single-instance sequential loop | Session state v2.0 with lock/claim model |
| Task decomposition | — | — | One context window per story | One artefact per workflow step |
| Cost optimisation | — | — | — | Model tier selection via Conductor |
| Failure resilience | — | Circuit breaker + anomaly detection | CI-gated iteration | Failure taxonomy + stopping rules |
| Learning persistence | Human taste gets encoded | — | Append-only progress.txt |
Skills + instructions evolve over time |
| Human control | Human taste gets encoded | Mandatory review gates | Max iterations cap | 5 approval gates + challenger reviews |
| Cost governance | Enforce invariants, not impls | — | — | iac-cost-repeatability instruction + adversarial checklists |
| Context efficiency | Context is scarce | Context shredding | Fresh context per iteration | Session Break Protocol at Gates 2 & 3 + conditional pass 3 |
Golden Principles¶
The system operates under 10 principles adapted from the Harness Engineering philosophy:
- Repository Is the System of Record — All context lives in-repo
- Map, Not Manual — Instructions point to deeper sources; no monolithic docs
- Enforce Invariants, Not Implementations — Set boundaries, allow autonomy within them
- Parse at Boundaries — Validate inputs and outputs at module edges
- AVM-First, Security Baseline Always — Azure Verified Modules and security defaults
- Golden Path Pattern — Shared utilities over hand-rolled helpers
- Human Taste Gets Encoded — Review feedback becomes rules, not one-off fixes
- Context Is Scarce — Every token must earn its keep
- Progressive Disclosure — Start small, drill deeper when needed
- Mechanical Enforcement Over Documentation — Linters and validators over prose
References¶
- Harness Engineering: openai.com/index/harness-engineering — OpenAI's account of building a product with zero manually-written code
- Bosun: github.com/virtengine/bosun — Open-source control plane for autonomous software engineering
- Ralph: github.com/snarktank/ralph — Autonomous AI agent loop based on Geoffrey Huntley's Ralph pattern
- Azure Well-Architected Framework: learn.microsoft.com/azure/well-architected
- Azure Verified Modules: aka.ms/AVM
- Azure Cloud Adoption Framework: learn.microsoft.com/azure/cloud-adoption-framework