Troubleshooting Guide

Diagnostic tools and troubleshooting

Troubleshooting Guide

Common issues and solutions for Agentic InfraOps

Agent Codenames Quick Reference

Agent	Codename	Common Issues
InfraOps Conductor	🎼 Maestro	Subagent invocation not working
requirements	📜 Scribe	Not appearing in list
architect	🏛️ Oracle	MCP pricing not connecting
bicep-planner	📐 Strategist	Governance discovery failing
terraform-planner	📐 Strategist	Governance discovery failing
bicep-codegen	⚒️ Forge	Validation subagents not running
terraform-codegen	⚒️ Forge	Provider version mismatches
bicep-deploy	🚀 Envoy	Azure auth issues
terraform-deploy	🚀 Envoy	State lock / init failures
challenger	⚔️ Challenger	—
diagnose	🔍 Sentinel	—

Quick Decision Tree

flowchart TD
    START["Problem?"] --> TYPE{"What type?"}

    TYPE -->|"Agent won't start"| AGENT
    TYPE -->|"Skill not activating"| SKILL
    TYPE -->|"Deployment fails"| DEPLOY
    TYPE -->|"Bicep errors"| VALIDATE_B
    TYPE -->|"Terraform errors"| VALIDATE_T
    TYPE -->|"Azure auth"| AUTH

    AGENT --> AGENT1["Check: Ctrl+Shift+A<br/>shows agent list?"]
    AGENT1 -->|No| AGENT2["Reload VS Code window"]
    AGENT1 -->|Yes| AGENT3["Agent missing from list?<br/>Check .agent.md exists"]

    SKILL --> SKILL1["Using trigger keywords?"]
    SKILL1 -->|No| SKILL2["Add explicit keywords<br/>or reference skill by name"]
    SKILL1 -->|Yes| SKILL3["Check SKILL.md file<br/>for correct triggers"]

    DEPLOY --> DEPLOY1["Run preflight first:<br/>deploy agent preflight check"]

    VALIDATE_B --> VALIDATE_B1["Run: bicep build main.bicep<br/>bicep lint main.bicep"]
    VALIDATE_T --> VALIDATE_T1["Run: terraform validate<br/>terraform fmt -check"]

    AUTH --> AUTH1["az: az login\nazd: azd auth login --use-device-code"]

    style START fill:#e1f5fe
    style AGENT fill:#fff3e0
    style SKILL fill:#f3e5f5
    style DEPLOY fill:#c8e6c9
    style VALIDATE_B fill:#fce4ec
    style VALIDATE_T fill:#e8d5f5
    style AUTH fill:#fff9c4

Common Issues

1. Agent Not Appearing in List

Symptom: Ctrl+Shift+A doesn’t show expected agent.

Causes:

Agent file not in .github/agents/ folder
YAML front matter syntax error
VS Code extension not loaded

Solutions:

# Check agent files exist
ls -la .github/agents/*.agent.md

# Validate YAML front matter
head -20 .github/agents/requirements.agent.md

Reload VS Code: Ctrl+Shift+P → “Developer: Reload Window”

2. Conductor/Subagent Invocation Not Working (VS Code 1.109+)

Symptom: The InfraOps Conductor (🎼 Maestro) doesn’t delegate to specialized agents. Responses are instant, no terminal commands execute, no files are created.

Root Cause: The chat.customAgentInSubagent.enabled setting is not enabled in User Settings.

Solutions:

Enable in User Settings (not just workspace):
- Press Ctrl+, → Search for customAgentInSubagent
- Check the box to enable
- OR add to User Settings JSON:
```
{
  "chat.customAgentInSubagent.enabled": true
}
```

Verify agents have agent tool:

grep -l '"agent"' .github/agents/*.agent.md
# Should list all main agents

Verify agents have wildcard agents array:

grep 'agents:.*\["\*"\]' .github/agents/*.agent.md
# Should show agents: ["*"] in each file

Use Chat Diagnostics:
- Right-click in Chat view → “Diagnostics”
- Check all agents are loaded correctly
If the session was interrupted (no new output, truncated response):
- Check agent-output/{project}/00-session-state.json for the last completed step
- Restart the Conductor with: “Resume the workflow from step X”
- See Workflow Engine for session state details

Note: Workspace settings (.vscode/settings.json) may not be sufficient for experimental features. User settings take precedence.

3. Skill Not Activating Automatically

Symptom: Prompt doesn’t trigger expected skill.

Causes:

Missing trigger keywords in prompt
Skill file not in .github/skills/ folder
Description doesn’t match user intent

Solutions:

Use explicit skill invocation:

"Use the azure-diagrams skill to create a diagram"

Check skill triggers in SKILL.md:

cat .github/skills/azure-diagrams/SKILL.md | head -30

4. Deployment Fails with Azure Policy Error

Symptom: az deployment group create fails with policy violation.

Common policies:

Error	Cause	Solution
”Azure AD only”	SQL Server needs AAD auth	Set `azureADOnlyAuthentication: true`
”Zone redundancy”	Wrong SKU tier	Use P1v4+ for App Service
”Missing tags”	Required tags absent	Add baseline tags (see `bicep-code-best-practices.instructions.md`) + governance extras

Run preflight check:

"Run deployment preflight for {project}"

5. Bicep Build Errors

Symptom: bicep build fails.

=== “Bicep”

**Common causes**:

```bash
# Check Bicep CLI version
bicep --version  # Should be 0.30+

# Validate syntax
bicep lint infra/bicep/{project}/main.bicep
```

**AVM module not found**:

```bash
# Restore modules from registry
bicep restore infra/bicep/{project}/main.bicep
```

5t. Terraform Validation Errors

Symptom: terraform validate or terraform plan fails.

=== “Terraform”

**Common causes and solutions**:

```bash
# Check Terraform CLI version
terraform --version  # Should be 1.5+

# Initialize providers (run from project directory)
cd infra/terraform/{project}
terraform init -backend=false

# Check formatting
terraform fmt -check -recursive

# Validate configuration
terraform validate
```

**Provider version mismatch**:

```bash
# Lock providers to specific versions
terraform providers lock -platform=linux_amd64
```

**AVM-TF module not found**:

Verify the module source in `main.tf` matches the Terraform Registry path:

```hcl
# Correct AVM-TF module source pattern
module "example" {
  source  = "Azure/avm-res-<provider>-<resource>/azurerm"
  version = "~> 0.x"
}
```

**TFLint errors**:

```bash
# Run TFLint with Azure ruleset
tflint --init
tflint --recursive
```

State lock issues:

terraform force-unlock <lock-id>

6. Azure Authentication Issues

Symptom: “Not logged in” or subscription errors during az or azd operations.

Azure CLI (`az`)

# Check (informational only — does NOT validate the token)
az account show --output table

# Mandatory — validate a real ARM token
az account get-access-token \
  --resource https://management.azure.com/ --output none

# Recovery
az login --use-device-code
az account set --subscription "<subscription-id>"

Azure Developer CLI (`azd`)

# Check azd auth status
azd auth login --check-status

# Login (device code works reliably in devcontainers/Codespaces)
azd auth login --use-device-code

Service Principal (both `az` and `azd`)

# az
az login --service-principal \
  -u $AZURE_CLIENT_ID -p $AZURE_CLIENT_SECRET \
  --tenant $AZURE_TENANT_ID

# azd
azd auth login \
  --client-id "$AZURE_CLIENT_ID" \
  --client-secret "$AZURE_CLIENT_SECRET" \
  --tenant-id "$AZURE_TENANT_ID"

7. Artifact Validation Failures

Symptom: npm run validate fails.

Causes:

Missing required H2 headings
Headings in wrong order
Using prohibited references

Check specific artifact:

# See validation rules
cat scripts/validate-artifact-templates.mjs | grep -A20 "ARTIFACT_HEADINGS"

Fix order issues: Compare with template:

diff -u .github/skills/azure-artifacts/templates/01-requirements.template.md agent-output/{project}/01-requirements.md

8. MCP Server Not Responding

Symptom: Azure Pricing MCP calls fail.

Solutions:

# Check MCP configuration
cat .vscode/mcp.json

# Verify Python environment
python3 --version  # Should be 3.10+

# Install dependencies
cd mcp/azure-pricing-mcp && pip install -r requirements.txt

9. Dev Container Build Fails

Symptom: Dev container won’t start.

Common causes:

Docker not running
Port conflicts
Outdated base image

Solutions:

# Rebuild without cache
# In VS Code: Ctrl+Shift+P → "Dev Containers: Rebuild Container Without Cache"

Check Docker is running:

docker ps

10. Orphaned VS Code Extensions Injecting Unwanted Instructions

Symptom: Copilot loads instruction files from extensions that are not listed in devcontainer.json (e.g., ms-azuretools.vscode-azure-github-copilot). You may see unexpected rules or context being injected into agent conversations.

Cause: Extension directories can persist in ~/.vscode-server/extensions/ even after an extension is removed from the devcontainer.json extensions list. VS Code auto-loads instruction files from any extension on disk, regardless of whether it is actively managed.

Solution:

List orphaned extensions:

# Compare installed extensions against devcontainer.json
ls ~/.vscode-server/extensions/ | sort > /tmp/installed.txt
# Look for anything not in your devcontainer.json extensions list

Remove the orphaned extension directory:

rm -rf ~/.vscode-server/extensions/<orphaned-extension-folder>

Reload the VS Code window (Ctrl+Shift+P → “Developer: Reload Window”).

Note: Orphaned extensions may reappear after a dev container rebuild from a cached Docker layer. If this happens, rebuild without cache: Ctrl+Shift+P → “Dev Containers: Rebuild Container Without Cache”.

11. Git Push Fails with Lefthook Errors

Symptom: Pre-commit hooks fail.

Common hooks:

Hook	Command	Fix
Artifact validation	`npm run validate`	Fix H2 structure
Markdown lint	`npm run lint:md`	Fix markdown issues
Commitlint	`commitlint`	Use conventional commit format

Skip hooks temporarily (not recommended):

git commit --no-verify -m "fix: temporary"

12. Handoff Prompt Not Working

Symptom: Agent handoff button does nothing.

Causes:

Handoff target agent doesn’t exist
YAML handoffs section malformed

Check handoffs syntax:

handoffs:
  - label: "Create WAF Assessment"
    agent: architect
    prompt: "Assess requirements for WAF..."
    send: true

Ensure target agent exists:

ls .github/agents/03-architect.agent.md

Diagnostic Commands

Environment Check

# All-in-one status
echo "=== Bicep ===" && bicep --version
echo "=== Terraform ===" && terraform --version
echo "=== TFLint ===" && tflint --version
echo "=== Azure CLI ===" && az version --output table
echo "=== Node ===" && node --version
echo "=== Python ===" && python3 --version
echo "=== Git ===" && git --version

Workspace Validation

# Validate all artifacts
npm run validate:all

# Bicep validation
bicep lint infra/bicep/{project}/main.bicep
bicep build infra/bicep/{project}/main.bicep

# Terraform validation
cd infra/terraform/{project} && terraform init -backend=false && terraform validate
npm run validate:terraform

# Lint markdown
npm run lint:md

Azure Status

# Current subscription
az account show --output table

# List resource groups
az group list --output table

# Check deployments
az deployment group list -g {resource-group} --output table

Getting Help

Check prompt guide: Prompt Guide has usage examples
Read agent definitions: .github/agents/*.agent.md
Check skill files: .github/skills/*/SKILL.md
Review templates: .github/skills/azure-artifacts/templates/

Still Stuck?

Use the diagnose agent (🔍 Sentinel):

Ctrl+Shift+A → diagnose
"My bicep-code agent isn't generating valid templates"

Or start the InfraOps Conductor (🎼 Maestro) for a guided workflow:

Ctrl+Shift+I → InfraOps Conductor
"Help me troubleshoot my Azure deployment"

Troubleshooting Guide

Troubleshooting Guide

Agent Codenames Quick Reference

Quick Decision Tree

Common Issues

1. Agent Not Appearing in List

2. Conductor/Subagent Invocation Not Working (VS Code 1.109+)

3. Skill Not Activating Automatically

4. Deployment Fails with Azure Policy Error

5. Bicep Build Errors

5t. Terraform Validation Errors

6. Azure Authentication Issues

Azure CLI (az)

Azure Developer CLI (azd)

Service Principal (both az and azd)

7. Artifact Validation Failures

8. MCP Server Not Responding

9. Dev Container Build Fails

10. Orphaned VS Code Extensions Injecting Unwanted Instructions

11. Git Push Fails with Lefthook Errors

12. Handoff Prompt Not Working

Diagnostic Commands

Environment Check

Workspace Validation

Azure Status

Getting Help

Still Stuck?

Azure CLI (`az`)

Azure Developer CLI (`azd`)

Service Principal (both `az` and `azd`)