Troubleshooting Guide

Troubleshooting Guide
Section titled “Troubleshooting Guide”Common issues and solutions for Agentic InfraOps
Agent Codenames Quick Reference
Section titled “Agent Codenames Quick Reference”| Agent | Codename | Common Issues |
|---|---|---|
| InfraOps Conductor | 🎼 Maestro | Subagent invocation not working |
| requirements | 📜 Scribe | Not appearing in list |
| architect | 🏛️ Oracle | MCP pricing not connecting |
| bicep-planner | 📐 Strategist | Governance discovery failing |
| terraform-planner | 📐 Strategist | Governance discovery failing |
| bicep-codegen | ⚒️ Forge | Validation subagents not running |
| terraform-codegen | ⚒️ Forge | Provider version mismatches |
| bicep-deploy | 🚀 Envoy | Azure auth issues |
| terraform-deploy | 🚀 Envoy | State lock / init failures |
| challenger | ⚔️ Challenger | — |
| diagnose | 🔍 Sentinel | — |
Quick Decision Tree
Section titled “Quick Decision Tree”flowchart TD
START["Problem?"] --> TYPE{"What type?"}
TYPE -->|"Agent won't start"| AGENT
TYPE -->|"Skill not activating"| SKILL
TYPE -->|"Deployment fails"| DEPLOY
TYPE -->|"Bicep errors"| VALIDATE_B
TYPE -->|"Terraform errors"| VALIDATE_T
TYPE -->|"Azure auth"| AUTH
AGENT --> AGENT1["Check: Ctrl+Shift+A<br/>shows agent list?"]
AGENT1 -->|No| AGENT2["Reload VS Code window"]
AGENT1 -->|Yes| AGENT3["Agent missing from list?<br/>Check .agent.md exists"]
SKILL --> SKILL1["Using trigger keywords?"]
SKILL1 -->|No| SKILL2["Add explicit keywords<br/>or reference skill by name"]
SKILL1 -->|Yes| SKILL3["Check SKILL.md file<br/>for correct triggers"]
DEPLOY --> DEPLOY1["Run preflight first:<br/>deploy agent preflight check"]
VALIDATE_B --> VALIDATE_B1["Run: bicep build main.bicep<br/>bicep lint main.bicep"]
VALIDATE_T --> VALIDATE_T1["Run: terraform validate<br/>terraform fmt -check"]
AUTH --> AUTH1["az: az login\nazd: azd auth login --use-device-code"]
style START fill:#e1f5fe
style AGENT fill:#fff3e0
style SKILL fill:#f3e5f5
style DEPLOY fill:#c8e6c9
style VALIDATE_B fill:#fce4ec
style VALIDATE_T fill:#e8d5f5
style AUTH fill:#fff9c4
Common Issues
Section titled “Common Issues”1. Agent Not Appearing in List
Section titled “1. Agent Not Appearing in List”Symptom: Ctrl+Shift+A doesn’t show expected agent.
Causes:
- Agent file not in
.github/agents/folder - YAML front matter syntax error
- VS Code extension not loaded
Solutions:
# Check agent files existls -la .github/agents/*.agent.md
# Validate YAML front matterhead -20 .github/agents/requirements.agent.mdReload VS Code: Ctrl+Shift+P → “Developer: Reload Window”
2. Conductor/Subagent Invocation Not Working (VS Code 1.109+)
Section titled “2. Conductor/Subagent Invocation Not Working (VS Code 1.109+)”Symptom: The InfraOps Conductor (🎼 Maestro) doesn’t delegate to specialized agents. Responses are instant, no terminal commands execute, no files are created.
Root Cause: The chat.customAgentInSubagent.enabled setting is not enabled in
User Settings.
Solutions:
-
Enable in User Settings (not just workspace):
- Press
Ctrl+,→ Search forcustomAgentInSubagent - Check the box to enable
- OR add to User Settings JSON:
{"chat.customAgentInSubagent.enabled": true} - Press
-
Verify agents have
agenttool:Terminal window grep -l '"agent"' .github/agents/*.agent.md# Should list all main agents -
Verify agents have wildcard
agentsarray:Terminal window grep 'agents:.*\["\*"\]' .github/agents/*.agent.md# Should show agents: ["*"] in each file -
Use Chat Diagnostics:
- Right-click in Chat view → “Diagnostics”
- Check all agents are loaded correctly
-
If the session was interrupted (no new output, truncated response):
- Check
agent-output/{project}/00-session-state.jsonfor the last completed step - Restart the Conductor with: “Resume the workflow from step X”
- See Workflow Engine for session state details
- Check
Note: Workspace settings (.vscode/settings.json) may not be sufficient
for experimental features. User settings take precedence.
3. Skill Not Activating Automatically
Section titled “3. Skill Not Activating Automatically”Symptom: Prompt doesn’t trigger expected skill.
Causes:
- Missing trigger keywords in prompt
- Skill file not in
.github/skills/folder - Description doesn’t match user intent
Solutions:
Use explicit skill invocation:
"Use the azure-diagrams skill to create a diagram"Check skill triggers in SKILL.md:
cat .github/skills/azure-diagrams/SKILL.md | head -304. Deployment Fails with Azure Policy Error
Section titled “4. Deployment Fails with Azure Policy Error”Symptom: az deployment group create fails with policy violation.
Common policies:
| Error | Cause | Solution |
|---|---|---|
| ”Azure AD only” | SQL Server needs AAD auth | Set azureADOnlyAuthentication: true |
| ”Zone redundancy” | Wrong SKU tier | Use P1v4+ for App Service |
| ”Missing tags” | Required tags absent | Add baseline tags (see bicep-code-best-practices.instructions.md) + governance extras |
Run preflight check:
"Run deployment preflight for {project}"5. Bicep Build Errors
Section titled “5. Bicep Build Errors”Symptom: bicep build fails.
=== “Bicep”
**Common causes**:
```bash# Check Bicep CLI versionbicep --version # Should be 0.30+
# Validate syntaxbicep lint infra/bicep/{project}/main.bicep```
**AVM module not found**:
```bash# Restore modules from registrybicep restore infra/bicep/{project}/main.bicep```5t. Terraform Validation Errors
Section titled “5t. Terraform Validation Errors”Symptom: terraform validate or terraform plan fails.
=== “Terraform”
**Common causes and solutions**:
```bash# Check Terraform CLI versionterraform --version # Should be 1.5+
# Initialize providers (run from project directory)cd infra/terraform/{project}terraform init -backend=false
# Check formattingterraform fmt -check -recursive
# Validate configurationterraform validate```
**Provider version mismatch**:
```bash# Lock providers to specific versionsterraform providers lock -platform=linux_amd64```
**AVM-TF module not found**:
Verify the module source in `main.tf` matches the Terraform Registry path:
```hcl# Correct AVM-TF module source patternmodule "example" { source = "Azure/avm-res-<provider>-<resource>/azurerm" version = "~> 0.x"}```
**TFLint errors**:
```bash# Run TFLint with Azure rulesettflint --inittflint --recursive```State lock issues:
terraform force-unlock <lock-id>6. Azure Authentication Issues
Section titled “6. Azure Authentication Issues”Symptom: “Not logged in” or subscription errors during az or azd operations.
Azure CLI (az)
Section titled “Azure CLI (az)”# Check (informational only — does NOT validate the token)az account show --output table
# Mandatory — validate a real ARM tokenaz account get-access-token \ --resource https://management.azure.com/ --output none
# Recoveryaz login --use-device-codeaz account set --subscription "<subscription-id>"Azure Developer CLI (azd)
Section titled “Azure Developer CLI (azd)”# Check azd auth statusazd auth login --check-status
# Login (device code works reliably in devcontainers/Codespaces)azd auth login --use-device-codeService Principal (both az and azd)
Section titled “Service Principal (both az and azd)”# azaz login --service-principal \ -u $AZURE_CLIENT_ID -p $AZURE_CLIENT_SECRET \ --tenant $AZURE_TENANT_ID
# azdazd auth login \ --client-id "$AZURE_CLIENT_ID" \ --client-secret "$AZURE_CLIENT_SECRET" \ --tenant-id "$AZURE_TENANT_ID"7. Artifact Validation Failures
Section titled “7. Artifact Validation Failures”Symptom: npm run validate fails.
Causes:
- Missing required H2 headings
- Headings in wrong order
- Using prohibited references
Check specific artifact:
# See validation rulescat scripts/validate-artifact-templates.mjs | grep -A20 "ARTIFACT_HEADINGS"Fix order issues: Compare with template:
diff -u .github/skills/azure-artifacts/templates/01-requirements.template.md agent-output/{project}/01-requirements.md8. MCP Server Not Responding
Section titled “8. MCP Server Not Responding”Symptom: Azure Pricing MCP calls fail.
Solutions:
# Check MCP configurationcat .vscode/mcp.json
# Verify Python environmentpython3 --version # Should be 3.10+
# Install dependenciescd mcp/azure-pricing-mcp && pip install -r requirements.txt9. Dev Container Build Fails
Section titled “9. Dev Container Build Fails”Symptom: Dev container won’t start.
Common causes:
- Docker not running
- Port conflicts
- Outdated base image
Solutions:
# Rebuild without cache# In VS Code: Ctrl+Shift+P → "Dev Containers: Rebuild Container Without Cache"Check Docker is running:
docker ps10. Orphaned VS Code Extensions Injecting Unwanted Instructions
Section titled “10. Orphaned VS Code Extensions Injecting Unwanted Instructions”Symptom: Copilot loads instruction files from extensions that are not listed in devcontainer.json
(e.g., ms-azuretools.vscode-azure-github-copilot). You may see unexpected rules or context being
injected into agent conversations.
Cause: Extension directories can persist in ~/.vscode-server/extensions/ even after an extension
is removed from the devcontainer.json extensions list. VS Code auto-loads instruction files from any
extension on disk, regardless of whether it is actively managed.
Solution:
-
List orphaned extensions:
Terminal window # Compare installed extensions against devcontainer.jsonls ~/.vscode-server/extensions/ | sort > /tmp/installed.txt# Look for anything not in your devcontainer.json extensions list -
Remove the orphaned extension directory:
Terminal window rm -rf ~/.vscode-server/extensions/<orphaned-extension-folder> -
Reload the VS Code window (
Ctrl+Shift+P→ “Developer: Reload Window”).
Note: Orphaned extensions may reappear after a dev container rebuild from a cached Docker layer. If this happens, rebuild without cache:
Ctrl+Shift+P→ “Dev Containers: Rebuild Container Without Cache”.
11. Git Push Fails with Lefthook Errors
Section titled “11. Git Push Fails with Lefthook Errors”Symptom: Pre-commit hooks fail.
Common hooks:
| Hook | Command | Fix |
|---|---|---|
| Artifact validation | npm run validate | Fix H2 structure |
| Markdown lint | npm run lint:md | Fix markdown issues |
| Commitlint | commitlint | Use conventional commit format |
Skip hooks temporarily (not recommended):
git commit --no-verify -m "fix: temporary"12. Handoff Prompt Not Working
Section titled “12. Handoff Prompt Not Working”Symptom: Agent handoff button does nothing.
Causes:
- Handoff target agent doesn’t exist
- YAML handoffs section malformed
Check handoffs syntax:
handoffs: - label: "Create WAF Assessment" agent: architect prompt: "Assess requirements for WAF..." send: trueEnsure target agent exists:
ls .github/agents/03-architect.agent.mdDiagnostic Commands
Section titled “Diagnostic Commands”Environment Check
Section titled “Environment Check”# All-in-one statusecho "=== Bicep ===" && bicep --versionecho "=== Terraform ===" && terraform --versionecho "=== TFLint ===" && tflint --versionecho "=== Azure CLI ===" && az version --output tableecho "=== Node ===" && node --versionecho "=== Python ===" && python3 --versionecho "=== Git ===" && git --versionWorkspace Validation
Section titled “Workspace Validation”# Validate all artifactsnpm run validate:all
# Bicep validationbicep lint infra/bicep/{project}/main.bicepbicep build infra/bicep/{project}/main.bicep
# Terraform validationcd infra/terraform/{project} && terraform init -backend=false && terraform validatenpm run validate:terraform
# Lint markdownnpm run lint:mdAzure Status
Section titled “Azure Status”# Current subscriptionaz account show --output table
# List resource groupsaz group list --output table
# Check deploymentsaz deployment group list -g {resource-group} --output tableGetting Help
Section titled “Getting Help”- Check prompt guide: Prompt Guide has usage examples
- Read agent definitions:
.github/agents/*.agent.md - Check skill files:
.github/skills/*/SKILL.md - Review templates:
.github/skills/azure-artifacts/templates/
Still Stuck?
Section titled “Still Stuck?”Use the diagnose agent (🔍 Sentinel):
Ctrl+Shift+A → diagnose"My bicep-code agent isn't generating valid templates"Or start the InfraOps Conductor (🎼 Maestro) for a guided workflow:
Ctrl+Shift+I → InfraOps Conductor"Help me troubleshoot my Azure deployment"