Skip to content

Troubleshooting Guide

Diagnostic tools and troubleshooting

Common issues and solutions for Agentic InfraOps

AgentCodenameCommon Issues
InfraOps Conductor🎼 MaestroSubagent invocation not working
requirements📜 ScribeNot appearing in list
architect🏛️ OracleMCP pricing not connecting
bicep-planner📐 StrategistGovernance discovery failing
terraform-planner📐 StrategistGovernance discovery failing
bicep-codegen⚒️ ForgeValidation subagents not running
terraform-codegen⚒️ ForgeProvider version mismatches
bicep-deploy🚀 EnvoyAzure auth issues
terraform-deploy🚀 EnvoyState lock / init failures
challenger⚔️ Challenger
diagnose🔍 Sentinel
flowchart TD
    START["Problem?"] --> TYPE{"What type?"}

    TYPE -->|"Agent won't start"| AGENT
    TYPE -->|"Skill not activating"| SKILL
    TYPE -->|"Deployment fails"| DEPLOY
    TYPE -->|"Bicep errors"| VALIDATE_B
    TYPE -->|"Terraform errors"| VALIDATE_T
    TYPE -->|"Azure auth"| AUTH

    AGENT --> AGENT1["Check: Ctrl+Shift+A<br/>shows agent list?"]
    AGENT1 -->|No| AGENT2["Reload VS Code window"]
    AGENT1 -->|Yes| AGENT3["Agent missing from list?<br/>Check .agent.md exists"]

    SKILL --> SKILL1["Using trigger keywords?"]
    SKILL1 -->|No| SKILL2["Add explicit keywords<br/>or reference skill by name"]
    SKILL1 -->|Yes| SKILL3["Check SKILL.md file<br/>for correct triggers"]

    DEPLOY --> DEPLOY1["Run preflight first:<br/>deploy agent preflight check"]

    VALIDATE_B --> VALIDATE_B1["Run: bicep build main.bicep<br/>bicep lint main.bicep"]
    VALIDATE_T --> VALIDATE_T1["Run: terraform validate<br/>terraform fmt -check"]

    AUTH --> AUTH1["az: az login\nazd: azd auth login --use-device-code"]

    style START fill:#e1f5fe
    style AGENT fill:#fff3e0
    style SKILL fill:#f3e5f5
    style DEPLOY fill:#c8e6c9
    style VALIDATE_B fill:#fce4ec
    style VALIDATE_T fill:#e8d5f5
    style AUTH fill:#fff9c4

Symptom: Ctrl+Shift+A doesn’t show expected agent.

Causes:

  • Agent file not in .github/agents/ folder
  • YAML front matter syntax error
  • VS Code extension not loaded

Solutions:

Terminal window
# Check agent files exist
ls -la .github/agents/*.agent.md
# Validate YAML front matter
head -20 .github/agents/requirements.agent.md

Reload VS Code: Ctrl+Shift+P → “Developer: Reload Window”

2. Conductor/Subagent Invocation Not Working (VS Code 1.109+)

Section titled “2. Conductor/Subagent Invocation Not Working (VS Code 1.109+)”

Symptom: The InfraOps Conductor (🎼 Maestro) doesn’t delegate to specialized agents. Responses are instant, no terminal commands execute, no files are created.

Root Cause: The chat.customAgentInSubagent.enabled setting is not enabled in User Settings.

Solutions:

  1. Enable in User Settings (not just workspace):

    • Press Ctrl+, → Search for customAgentInSubagent
    • Check the box to enable
    • OR add to User Settings JSON:
    {
    "chat.customAgentInSubagent.enabled": true
    }
  2. Verify agents have agent tool:

    Terminal window
    grep -l '"agent"' .github/agents/*.agent.md
    # Should list all main agents
  3. Verify agents have wildcard agents array:

    Terminal window
    grep 'agents:.*\["\*"\]' .github/agents/*.agent.md
    # Should show agents: ["*"] in each file
  4. Use Chat Diagnostics:

    • Right-click in Chat view → “Diagnostics”
    • Check all agents are loaded correctly
  5. If the session was interrupted (no new output, truncated response):

    • Check agent-output/{project}/00-session-state.json for the last completed step
    • Restart the Conductor with: “Resume the workflow from step X”
    • See Workflow Engine for session state details

Note: Workspace settings (.vscode/settings.json) may not be sufficient for experimental features. User settings take precedence.

Symptom: Prompt doesn’t trigger expected skill.

Causes:

  • Missing trigger keywords in prompt
  • Skill file not in .github/skills/ folder
  • Description doesn’t match user intent

Solutions:

Use explicit skill invocation:

"Use the azure-diagrams skill to create a diagram"

Check skill triggers in SKILL.md:

Terminal window
cat .github/skills/azure-diagrams/SKILL.md | head -30

4. Deployment Fails with Azure Policy Error

Section titled “4. Deployment Fails with Azure Policy Error”

Symptom: az deployment group create fails with policy violation.

Common policies:

ErrorCauseSolution
”Azure AD only”SQL Server needs AAD authSet azureADOnlyAuthentication: true
”Zone redundancy”Wrong SKU tierUse P1v4+ for App Service
”Missing tags”Required tags absentAdd baseline tags (see bicep-code-best-practices.instructions.md) + governance extras

Run preflight check:

"Run deployment preflight for {project}"

Symptom: bicep build fails.

=== “Bicep”

**Common causes**:
```bash
# Check Bicep CLI version
bicep --version # Should be 0.30+
# Validate syntax
bicep lint infra/bicep/{project}/main.bicep
```
**AVM module not found**:
```bash
# Restore modules from registry
bicep restore infra/bicep/{project}/main.bicep
```

Symptom: terraform validate or terraform plan fails.

=== “Terraform”

**Common causes and solutions**:
```bash
# Check Terraform CLI version
terraform --version # Should be 1.5+
# Initialize providers (run from project directory)
cd infra/terraform/{project}
terraform init -backend=false
# Check formatting
terraform fmt -check -recursive
# Validate configuration
terraform validate
```
**Provider version mismatch**:
```bash
# Lock providers to specific versions
terraform providers lock -platform=linux_amd64
```
**AVM-TF module not found**:
Verify the module source in `main.tf` matches the Terraform Registry path:
```hcl
# Correct AVM-TF module source pattern
module "example" {
source = "Azure/avm-res-<provider>-<resource>/azurerm"
version = "~> 0.x"
}
```
**TFLint errors**:
```bash
# Run TFLint with Azure ruleset
tflint --init
tflint --recursive
```

State lock issues:

Terminal window
terraform force-unlock <lock-id>

Symptom: “Not logged in” or subscription errors during az or azd operations.

Terminal window
# Check (informational only — does NOT validate the token)
az account show --output table
# Mandatory — validate a real ARM token
az account get-access-token \
--resource https://management.azure.com/ --output none
# Recovery
az login --use-device-code
az account set --subscription "<subscription-id>"
Terminal window
# Check azd auth status
azd auth login --check-status
# Login (device code works reliably in devcontainers/Codespaces)
azd auth login --use-device-code
Terminal window
# az
az login --service-principal \
-u $AZURE_CLIENT_ID -p $AZURE_CLIENT_SECRET \
--tenant $AZURE_TENANT_ID
# azd
azd auth login \
--client-id "$AZURE_CLIENT_ID" \
--client-secret "$AZURE_CLIENT_SECRET" \
--tenant-id "$AZURE_TENANT_ID"

Symptom: npm run validate fails.

Causes:

  • Missing required H2 headings
  • Headings in wrong order
  • Using prohibited references

Check specific artifact:

Terminal window
# See validation rules
cat scripts/validate-artifact-templates.mjs | grep -A20 "ARTIFACT_HEADINGS"

Fix order issues: Compare with template:

Terminal window
diff -u .github/skills/azure-artifacts/templates/01-requirements.template.md agent-output/{project}/01-requirements.md

Symptom: Azure Pricing MCP calls fail.

Solutions:

Terminal window
# Check MCP configuration
cat .vscode/mcp.json
# Verify Python environment
python3 --version # Should be 3.10+
# Install dependencies
cd mcp/azure-pricing-mcp && pip install -r requirements.txt

Symptom: Dev container won’t start.

Common causes:

  • Docker not running
  • Port conflicts
  • Outdated base image

Solutions:

Terminal window
# Rebuild without cache
# In VS Code: Ctrl+Shift+P → "Dev Containers: Rebuild Container Without Cache"

Check Docker is running:

Terminal window
docker ps

10. Orphaned VS Code Extensions Injecting Unwanted Instructions

Section titled “10. Orphaned VS Code Extensions Injecting Unwanted Instructions”

Symptom: Copilot loads instruction files from extensions that are not listed in devcontainer.json (e.g., ms-azuretools.vscode-azure-github-copilot). You may see unexpected rules or context being injected into agent conversations.

Cause: Extension directories can persist in ~/.vscode-server/extensions/ even after an extension is removed from the devcontainer.json extensions list. VS Code auto-loads instruction files from any extension on disk, regardless of whether it is actively managed.

Solution:

  1. List orphaned extensions:

    Terminal window
    # Compare installed extensions against devcontainer.json
    ls ~/.vscode-server/extensions/ | sort > /tmp/installed.txt
    # Look for anything not in your devcontainer.json extensions list
  2. Remove the orphaned extension directory:

    Terminal window
    rm -rf ~/.vscode-server/extensions/<orphaned-extension-folder>
  3. Reload the VS Code window (Ctrl+Shift+P → “Developer: Reload Window”).

Note: Orphaned extensions may reappear after a dev container rebuild from a cached Docker layer. If this happens, rebuild without cache: Ctrl+Shift+P → “Dev Containers: Rebuild Container Without Cache”.

Symptom: Pre-commit hooks fail.

Common hooks:

HookCommandFix
Artifact validationnpm run validateFix H2 structure
Markdown lintnpm run lint:mdFix markdown issues
CommitlintcommitlintUse conventional commit format

Skip hooks temporarily (not recommended):

Terminal window
git commit --no-verify -m "fix: temporary"

Symptom: Agent handoff button does nothing.

Causes:

  • Handoff target agent doesn’t exist
  • YAML handoffs section malformed

Check handoffs syntax:

handoffs:
- label: "Create WAF Assessment"
agent: architect
prompt: "Assess requirements for WAF..."
send: true

Ensure target agent exists:

Terminal window
ls .github/agents/03-architect.agent.md
Terminal window
# All-in-one status
echo "=== Bicep ===" && bicep --version
echo "=== Terraform ===" && terraform --version
echo "=== TFLint ===" && tflint --version
echo "=== Azure CLI ===" && az version --output table
echo "=== Node ===" && node --version
echo "=== Python ===" && python3 --version
echo "=== Git ===" && git --version
Terminal window
# Validate all artifacts
npm run validate:all
# Bicep validation
bicep lint infra/bicep/{project}/main.bicep
bicep build infra/bicep/{project}/main.bicep
# Terraform validation
cd infra/terraform/{project} && terraform init -backend=false && terraform validate
npm run validate:terraform
# Lint markdown
npm run lint:md
Terminal window
# Current subscription
az account show --output table
# List resource groups
az group list --output table
# Check deployments
az deployment group list -g {resource-group} --output table
  1. Check prompt guide: Prompt Guide has usage examples
  2. Read agent definitions: .github/agents/*.agent.md
  3. Check skill files: .github/skills/*/SKILL.md
  4. Review templates: .github/skills/azure-artifacts/templates/

Use the diagnose agent (🔍 Sentinel):

Ctrl+Shift+A → diagnose
"My bicep-code agent isn't generating valid templates"

Or start the InfraOps Conductor (🎼 Maestro) for a guided workflow:

Ctrl+Shift+I → InfraOps Conductor
"Help me troubleshoot my Azure deployment"