Troubleshooting Guide

Troubleshooting Guide
Section titled “Troubleshooting Guide”Common issues and solutions for APEX
Agent Codenames Quick Reference
Section titled “Agent Codenames Quick Reference”| Agent | Codename | Common Issues |
|---|---|---|
| Orchestrator | 🧠 Orchestrator | Subagent invocation not working |
| requirements | 📜 Scribe | Not appearing in list |
| architect | 🏛️ Oracle | MCP pricing not connecting |
| iac-planner | 📐 Strategist | Governance discovery failing |
| bicep-codegen | ⚒️ Forge | Validation subagents not running |
| terraform-codegen | ⚒️ Forge | Provider version mismatches |
| bicep-deploy | 🚀 Envoy | Azure auth issues |
| terraform-deploy | 🚀 Envoy | State lock / init failures |
| challenger | ⚔️ Challenger | — |
| diagnose | 🔍 Sentinel | — |
Quick Decision Tree
Section titled “Quick Decision Tree”Before you start troubleshooting, confirm whether you are running inside the dev container or directly on your local machine. Setup fixes differ: container problems usually point to Docker or forwarded settings, while local problems usually point to missing CLIs or environment variables.
flowchart TD
START["Problem?"] --> TYPE{"What type?"}
TYPE -->|"Agent won't start"| AGENT
TYPE -->|"Skill not activating"| SKILL
TYPE -->|"Deployment fails"| DEPLOY
TYPE -->|"Bicep errors"| VALIDATE_B
TYPE -->|"Terraform errors"| VALIDATE_T
TYPE -->|"Azure auth"| AUTH
AGENT --> AGENT1["Check: Ctrl+Shift+A<br/>shows agent list?"]
AGENT1 -->|No| AGENT2["Reload VS Code window"]
AGENT1 -->|Yes| AGENT3["Agent missing from list?<br/>Check .agent.md exists"]
SKILL --> SKILL1["Using trigger keywords?"]
SKILL1 -->|No| SKILL2["Add explicit keywords<br/>or reference skill by name"]
SKILL1 -->|Yes| SKILL3["Check SKILL.md file<br/>for correct triggers"]
DEPLOY --> DEPLOY1["Run preflight first:<br/>deploy agent preflight check"]
VALIDATE_B --> VALIDATE_B1["Run: bicep build main.bicep<br/>bicep lint main.bicep"]
VALIDATE_T --> VALIDATE_T1["Run: terraform validate<br/>terraform fmt -check"]
AUTH --> AUTH1["az: az login\nazd: azd auth login --use-device-code"]
style START fill:#e1f5fe
style AGENT fill:#fff3e0
style SKILL fill:#f3e5f5
style DEPLOY fill:#c8e6c9
style VALIDATE_B fill:#fce4ec
style VALIDATE_T fill:#e8d5f5
style AUTH fill:#fff9c4
Common Issues
Section titled “Common Issues”1. Agent Not Appearing in List
Section titled “1. Agent Not Appearing in List”Symptom: Ctrl+Shift+A doesn’t show expected agent.
Causes:
- Agent file not in
.github/agents/folder - YAML front matter syntax error
- VS Code extension not loaded
Solutions:
# Check agent files existls -la .github/agents/*.agent.md
# Validate YAML front matterhead -20 .github/agents/requirements.agent.mdReload VS Code: Ctrl+Shift+P → “Developer: Reload Window”
2. Orchestrator/Subagent Invocation Not Working (VS Code 1.109+)
Section titled “2. Orchestrator/Subagent Invocation Not Working (VS Code 1.109+)”Symptom: The Orchestrator (🧠 Orchestrator) doesn’t delegate to specialized agents. Responses are instant, no terminal commands execute, no files are created.
Root Cause: The chat.customAgentInSubagent.enabled setting is not enabled in
User Settings.
Solutions:
-
Enable in User Settings (not just workspace):
- Press
Ctrl+,→ Search forcustomAgentInSubagent - Check the box to enable
- OR add to User Settings JSON:
{"chat.customAgentInSubagent.enabled": true} - Press
-
Verify agents have
agenttool:Terminal window grep -l '"agent"' .github/agents/*.agent.md# Should list all main agents -
Verify agents have wildcard
agentsarray:Terminal window grep 'agents:.*\["\*"\]' .github/agents/*.agent.md# Should show agents: ["*"] in each file -
Use Chat Diagnostics:
- Right-click in Chat view → “Diagnostics”
- Check all agents are loaded correctly
-
If the session was interrupted (no new output, truncated response):
- Check
agent-output/{project}/00-session-state.jsonfor the last completed step - Restart the Orchestrator with: “Resume the workflow from step X”
- See Workflow Engine for session state details
- Check
Note: Workspace settings (.vscode/settings.json) may not be sufficient
for experimental features. User settings take precedence.
If the workflow already produced files before failing, resume from the same step instead of restarting the whole run. Open the failing artifact, collect the exact validation output, and feed that back into the parent agent.
3. Skill Not Activating Automatically
Section titled “3. Skill Not Activating Automatically”Symptom: Prompt doesn’t trigger expected skill.
Causes:
- Missing trigger keywords in prompt
- Skill file not in
.github/skills/folder - Description doesn’t match user intent
Solutions:
Use explicit skill invocation:
"Use the drawio skill to create a diagram"Check skill triggers in SKILL.md:
cat .github/skills/drawio/SKILL.md | head -304. Deployment Fails with Azure Policy Error
Section titled “4. Deployment Fails with Azure Policy Error”Symptom: az deployment group create fails with policy violation.
Common policies:
| Error | Cause | Solution |
|---|---|---|
| ”Azure AD only” | SQL Server needs AAD auth | Set azureADOnlyAuthentication: true |
| ”Zone redundancy” | Wrong SKU tier | Use P1v4+ for App Service |
| ”Missing tags” | Required tags absent | Add baseline tags (see iac-bicep-best-practices.instructions.md or iac-terraform-best-practices.instructions.md) + governance extras |
Run preflight check:
"Run deployment preflight for {project}"5. Bicep Build Errors
Section titled “5. Bicep Build Errors”Symptom: bicep build fails.
=== “Bicep”
**Common causes**:
```bash# Check Bicep CLI versionbicep --version # Should be 0.30+
# Validate syntaxbicep lint infra/bicep/{project}/main.bicep```
**AVM module not found**:
```bash# Restore modules from registrybicep restore infra/bicep/{project}/main.bicep```5t. Terraform Validation Errors
Section titled “5t. Terraform Validation Errors”Symptom: terraform validate or terraform plan fails.
=== “Terraform”
**Common causes and solutions**:
```bash# Check Terraform CLI versionterraform --version # Should be 1.5+
# Initialize providers (run from project directory)cd infra/terraform/{project}terraform init -backend=false
# Check formattingterraform fmt -check -recursive
# Validate configurationterraform validate```
**Provider version mismatch**:
```bash# Lock providers to specific versionsterraform providers lock -platform=linux_amd64```
**AVM-TF module not found**:
Verify the module source in `main.tf` matches the Terraform Registry path:
```hcl# Correct AVM-TF module source patternmodule "example" { source = "Azure/avm-res-<provider>-<resource>/azurerm" version = "~> 0.x"}```
**TFLint errors**:
```bash# Run TFLint with Azure rulesettflint --inittflint --recursive```State lock issues:
terraform force-unlock <lock-id>6. Azure Authentication Issues
Section titled “6. Azure Authentication Issues”Symptom: “Not logged in” or subscription errors during az or azd operations.
Azure CLI (az)
Section titled “Azure CLI (az)”# Check (informational only — does NOT validate the token)az account show --output table
# Mandatory — validate a real ARM tokenaz account get-access-token \ --resource https://management.azure.com/ --output none
# Recoveryaz login --use-device-codeaz account set --subscription "<subscription-id>"Azure Developer CLI (azd)
Section titled “Azure Developer CLI (azd)”# Check azd auth statusazd auth login --check-status
# Login (device code works reliably in devcontainers/Codespaces)azd auth login --use-device-codeService Principal (both az and azd)
Section titled “Service Principal (both az and azd)”# azaz login --service-principal \ -u $AZURE_CLIENT_ID -p $AZURE_CLIENT_SECRET \ --tenant $AZURE_TENANT_ID
# azdazd auth login \ --client-id "$AZURE_CLIENT_ID" \ --client-secret "$AZURE_CLIENT_SECRET" \ --tenant-id "$AZURE_TENANT_ID"7. Artifact Validation Failures
Section titled “7. Artifact Validation Failures”Symptom: npm run validate fails.
Causes:
- Missing required H2 headings
- Headings in wrong order
- Using prohibited references
Check specific artifact:
# See validation rulescat scripts/_lib/artifact-headings.mjs | grep -A20 "ARTIFACT_HEADINGS"Fix order issues: Compare with template:
diff -u .github/skills/azure-artifacts/templates/01-requirements.template.md agent-output/{project}/01-requirements.md8. MCP Server Not Responding
Section titled “8. MCP Server Not Responding”Symptom: Azure Pricing MCP calls fail.
Solutions:
# Check MCP configurationcat .vscode/mcp.json
# Verify Python environmentpython3 --version # Should be 3.10+
# Install dependenciescd tools/mcp-servers/azure-pricing && pip install -r requirements.txt9. Dev Container Build Fails
Section titled “9. Dev Container Build Fails”Symptom: Dev container won’t start.
Common causes:
- Docker not running
- Port conflicts
- Outdated base image
Solutions:
# Rebuild without cache# In VS Code: Ctrl+Shift+P → "Dev Containers: Rebuild Container Without Cache"Check Docker is running:
docker ps10. Orphaned VS Code Extensions Injecting Unwanted Instructions
Section titled “10. Orphaned VS Code Extensions Injecting Unwanted Instructions”Symptom: Copilot loads instruction files from extensions that are not listed in devcontainer.json
(e.g., ms-azuretools.vscode-azure-github-copilot). You may see unexpected rules or context being
injected into agent conversations.
Cause: Extension directories can persist in ~/.vscode-server/extensions/ even after an extension
is removed from the devcontainer.json extensions list. VS Code auto-loads instruction files from any
extension on disk, regardless of whether it is actively managed.
Solution:
-
List orphaned extensions:
Terminal window # Compare installed extensions against devcontainer.jsonls ~/.vscode-server/extensions/ | sort > /tmp/installed.txt# Look for anything not in your devcontainer.json extensions list -
Remove the orphaned extension directory:
Terminal window rm -rf ~/.vscode-server/extensions/<orphaned-extension-folder> -
Reload the VS Code window (
Ctrl+Shift+P→ “Developer: Reload Window”).
Note: Orphaned extensions may reappear after a dev container rebuild from a cached Docker layer. If this happens, rebuild without cache:
Ctrl+Shift+P→ “Dev Containers: Rebuild Container Without Cache”.
11. Git Push Fails with Lefthook Errors
Section titled “11. Git Push Fails with Lefthook Errors”Symptom: Pre-commit hooks fail.
Common hooks:
| Hook | Command | Fix |
|---|---|---|
| Artifact validation | npm run validate | Fix H2 structure |
| Markdown lint | npm run lint:md | Fix markdown issues |
| Commitlint | commitlint | Use conventional commit format |
Skip hooks temporarily (not recommended):
git commit --no-verify -m "fix: temporary"12. Handoff Prompt Not Working
Section titled “12. Handoff Prompt Not Working”Symptom: Agent handoff button does nothing.
Causes:
- Handoff target agent doesn’t exist
- YAML handoffs section malformed
Check handoffs syntax:
handoffs: - label: "Create WAF Assessment" agent: architect prompt: "Assess requirements for WAF..." send: trueEnsure target agent exists:
ls .github/agents/03-architect.agent.mdDiagnostic Commands
Section titled “Diagnostic Commands”Environment Check
Section titled “Environment Check”# All-in-one statusecho "=== Bicep ===" && bicep --versionecho "=== Terraform ===" && terraform --versionecho "=== TFLint ===" && tflint --versionecho "=== Azure CLI ===" && az version --output tableecho "=== Node ===" && node --versionecho "=== Python ===" && python3 --versionecho "=== Git ===" && git --versionWorkspace Validation
Section titled “Workspace Validation”# Validate all artifactsnpm run validate:all
# Bicep validationbicep lint infra/bicep/{project}/main.bicepbicep build infra/bicep/{project}/main.bicep
# Terraform validationcd infra/terraform/{project} && terraform init -backend=false && terraform validatenpm run validate:terraform
# Lint markdownnpm run lint:mdAzure Status
Section titled “Azure Status”# Current subscriptionaz account show --output table
# List resource groupsaz group list --output table
# Check deploymentsaz deployment group list -g {resource-group} --output tableGetting Help
Section titled “Getting Help”- Check prompt guide: Prompt Guide has usage examples
- Read agent definitions:
.github/agents/*.agent.md - Check skill files:
.github/skills/*/SKILL.md - Review templates:
.github/skills/azure-artifacts/templates/
Still Stuck?
Section titled “Still Stuck?”Use the diagnose agent (🔍 Sentinel):
Ctrl+Shift+A → diagnose"My bicep-code agent isn't generating valid templates"Or start the Orchestrator (🧠 Orchestrator) for a guided workflow:
Ctrl+Shift+I → Orchestrator"Help me troubleshoot my Azure deployment"Related
Section titled “Related”- Quickstart — install and run your first project
- Workflow — how agents collaborate across steps
- Session Debugging — inspect session state and resume