Skip to content

Troubleshooting Guide

Diagnostic tools and troubleshooting

Common issues and solutions for APEX

AgentCodenameCommon Issues
Orchestrator🧠 OrchestratorSubagent invocation not working
requirements📜 ScribeNot appearing in list
architect🏛️ OracleMCP pricing not connecting
iac-planner📐 StrategistGovernance discovery failing
bicep-codegen⚒️ ForgeValidation subagents not running
terraform-codegen⚒️ ForgeProvider version mismatches
bicep-deploy🚀 EnvoyAzure auth issues
terraform-deploy🚀 EnvoyState lock / init failures
challenger⚔️ Challenger
diagnose🔍 Sentinel

Before you start troubleshooting, confirm whether you are running inside the dev container or directly on your local machine. Setup fixes differ: container problems usually point to Docker or forwarded settings, while local problems usually point to missing CLIs or environment variables.

flowchart TD
    START["Problem?"] --> TYPE{"What type?"}

    TYPE -->|"Agent won't start"| AGENT
    TYPE -->|"Skill not activating"| SKILL
    TYPE -->|"Deployment fails"| DEPLOY
    TYPE -->|"Bicep errors"| VALIDATE_B
    TYPE -->|"Terraform errors"| VALIDATE_T
    TYPE -->|"Azure auth"| AUTH

    AGENT --> AGENT1["Check: Ctrl+Shift+A<br/>shows agent list?"]
    AGENT1 -->|No| AGENT2["Reload VS Code window"]
    AGENT1 -->|Yes| AGENT3["Agent missing from list?<br/>Check .agent.md exists"]

    SKILL --> SKILL1["Using trigger keywords?"]
    SKILL1 -->|No| SKILL2["Add explicit keywords<br/>or reference skill by name"]
    SKILL1 -->|Yes| SKILL3["Check SKILL.md file<br/>for correct triggers"]

    DEPLOY --> DEPLOY1["Run preflight first:<br/>deploy agent preflight check"]

    VALIDATE_B --> VALIDATE_B1["Run: bicep build main.bicep<br/>bicep lint main.bicep"]
    VALIDATE_T --> VALIDATE_T1["Run: terraform validate<br/>terraform fmt -check"]

    AUTH --> AUTH1["az: az login\nazd: azd auth login --use-device-code"]

    style START fill:#e1f5fe
    style AGENT fill:#fff3e0
    style SKILL fill:#f3e5f5
    style DEPLOY fill:#c8e6c9
    style VALIDATE_B fill:#fce4ec
    style VALIDATE_T fill:#e8d5f5
    style AUTH fill:#fff9c4

Symptom: Ctrl+Shift+A doesn’t show expected agent.

Causes:

  • Agent file not in .github/agents/ folder
  • YAML front matter syntax error
  • VS Code extension not loaded

Solutions:

Terminal window
# Check agent files exist
ls -la .github/agents/*.agent.md
# Validate YAML front matter
head -20 .github/agents/requirements.agent.md

Reload VS Code: Ctrl+Shift+P → “Developer: Reload Window”

2. Orchestrator/Subagent Invocation Not Working (VS Code 1.109+)

Section titled “2. Orchestrator/Subagent Invocation Not Working (VS Code 1.109+)”

Symptom: The Orchestrator (🧠 Orchestrator) doesn’t delegate to specialized agents. Responses are instant, no terminal commands execute, no files are created.

Root Cause: The chat.customAgentInSubagent.enabled setting is not enabled in User Settings.

Solutions:

  1. Enable in User Settings (not just workspace):

    • Press Ctrl+, → Search for customAgentInSubagent
    • Check the box to enable
    • OR add to User Settings JSON:
    {
    "chat.customAgentInSubagent.enabled": true
    }
  2. Verify agents have agent tool:

    Terminal window
    grep -l '"agent"' .github/agents/*.agent.md
    # Should list all main agents
  3. Verify agents have wildcard agents array:

    Terminal window
    grep 'agents:.*\["\*"\]' .github/agents/*.agent.md
    # Should show agents: ["*"] in each file
  4. Use Chat Diagnostics:

    • Right-click in Chat view → “Diagnostics”
    • Check all agents are loaded correctly
  5. If the session was interrupted (no new output, truncated response):

    • Check agent-output/{project}/00-session-state.json for the last completed step
    • Restart the Orchestrator with: “Resume the workflow from step X”
    • See Workflow Engine for session state details

Note: Workspace settings (.vscode/settings.json) may not be sufficient for experimental features. User settings take precedence.

If the workflow already produced files before failing, resume from the same step instead of restarting the whole run. Open the failing artifact, collect the exact validation output, and feed that back into the parent agent.

Symptom: Prompt doesn’t trigger expected skill.

Causes:

  • Missing trigger keywords in prompt
  • Skill file not in .github/skills/ folder
  • Description doesn’t match user intent

Solutions:

Use explicit skill invocation:

"Use the drawio skill to create a diagram"

Check skill triggers in SKILL.md:

Terminal window
cat .github/skills/drawio/SKILL.md | head -30

4. Deployment Fails with Azure Policy Error

Section titled “4. Deployment Fails with Azure Policy Error”

Symptom: az deployment group create fails with policy violation.

Common policies:

ErrorCauseSolution
”Azure AD only”SQL Server needs AAD authSet azureADOnlyAuthentication: true
”Zone redundancy”Wrong SKU tierUse P1v4+ for App Service
”Missing tags”Required tags absentAdd baseline tags (see iac-bicep-best-practices.instructions.md or iac-terraform-best-practices.instructions.md) + governance extras

Run preflight check:

"Run deployment preflight for {project}"

Symptom: bicep build fails.

=== “Bicep”

**Common causes**:
```bash
# Check Bicep CLI version
bicep --version # Should be 0.30+
# Validate syntax
bicep lint infra/bicep/{project}/main.bicep
```
**AVM module not found**:
```bash
# Restore modules from registry
bicep restore infra/bicep/{project}/main.bicep
```

Symptom: terraform validate or terraform plan fails.

=== “Terraform”

**Common causes and solutions**:
```bash
# Check Terraform CLI version
terraform --version # Should be 1.5+
# Initialize providers (run from project directory)
cd infra/terraform/{project}
terraform init -backend=false
# Check formatting
terraform fmt -check -recursive
# Validate configuration
terraform validate
```
**Provider version mismatch**:
```bash
# Lock providers to specific versions
terraform providers lock -platform=linux_amd64
```
**AVM-TF module not found**:
Verify the module source in `main.tf` matches the Terraform Registry path:
```hcl
# Correct AVM-TF module source pattern
module "example" {
source = "Azure/avm-res-<provider>-<resource>/azurerm"
version = "~> 0.x"
}
```
**TFLint errors**:
```bash
# Run TFLint with Azure ruleset
tflint --init
tflint --recursive
```

State lock issues:

Terminal window
terraform force-unlock <lock-id>

Symptom: “Not logged in” or subscription errors during az or azd operations.

Terminal window
# Check (informational only — does NOT validate the token)
az account show --output table
# Mandatory — validate a real ARM token
az account get-access-token \
--resource https://management.azure.com/ --output none
# Recovery
az login --use-device-code
az account set --subscription "<subscription-id>"
Terminal window
# Check azd auth status
azd auth login --check-status
# Login (device code works reliably in devcontainers/Codespaces)
azd auth login --use-device-code
Terminal window
# az
az login --service-principal \
-u $AZURE_CLIENT_ID -p $AZURE_CLIENT_SECRET \
--tenant $AZURE_TENANT_ID
# azd
azd auth login \
--client-id "$AZURE_CLIENT_ID" \
--client-secret "$AZURE_CLIENT_SECRET" \
--tenant-id "$AZURE_TENANT_ID"

Symptom: npm run validate fails.

Causes:

  • Missing required H2 headings
  • Headings in wrong order
  • Using prohibited references

Check specific artifact:

Terminal window
# See validation rules
cat scripts/_lib/artifact-headings.mjs | grep -A20 "ARTIFACT_HEADINGS"

Fix order issues: Compare with template:

Terminal window
diff -u .github/skills/azure-artifacts/templates/01-requirements.template.md agent-output/{project}/01-requirements.md

Symptom: Azure Pricing MCP calls fail.

Solutions:

Terminal window
# Check MCP configuration
cat .vscode/mcp.json
# Verify Python environment
python3 --version # Should be 3.10+
# Install dependencies
cd tools/mcp-servers/azure-pricing && pip install -r requirements.txt

Symptom: Dev container won’t start.

Common causes:

  • Docker not running
  • Port conflicts
  • Outdated base image

Solutions:

Terminal window
# Rebuild without cache
# In VS Code: Ctrl+Shift+P → "Dev Containers: Rebuild Container Without Cache"

Check Docker is running:

Terminal window
docker ps

10. Orphaned VS Code Extensions Injecting Unwanted Instructions

Section titled “10. Orphaned VS Code Extensions Injecting Unwanted Instructions”

Symptom: Copilot loads instruction files from extensions that are not listed in devcontainer.json (e.g., ms-azuretools.vscode-azure-github-copilot). You may see unexpected rules or context being injected into agent conversations.

Cause: Extension directories can persist in ~/.vscode-server/extensions/ even after an extension is removed from the devcontainer.json extensions list. VS Code auto-loads instruction files from any extension on disk, regardless of whether it is actively managed.

Solution:

  1. List orphaned extensions:

    Terminal window
    # Compare installed extensions against devcontainer.json
    ls ~/.vscode-server/extensions/ | sort > /tmp/installed.txt
    # Look for anything not in your devcontainer.json extensions list
  2. Remove the orphaned extension directory:

    Terminal window
    rm -rf ~/.vscode-server/extensions/<orphaned-extension-folder>
  3. Reload the VS Code window (Ctrl+Shift+P → “Developer: Reload Window”).

Note: Orphaned extensions may reappear after a dev container rebuild from a cached Docker layer. If this happens, rebuild without cache: Ctrl+Shift+P → “Dev Containers: Rebuild Container Without Cache”.

Symptom: Pre-commit hooks fail.

Common hooks:

HookCommandFix
Artifact validationnpm run validateFix H2 structure
Markdown lintnpm run lint:mdFix markdown issues
CommitlintcommitlintUse conventional commit format

Skip hooks temporarily (not recommended):

Terminal window
git commit --no-verify -m "fix: temporary"

Symptom: Agent handoff button does nothing.

Causes:

  • Handoff target agent doesn’t exist
  • YAML handoffs section malformed

Check handoffs syntax:

handoffs:
- label: "Create WAF Assessment"
agent: architect
prompt: "Assess requirements for WAF..."
send: true

Ensure target agent exists:

Terminal window
ls .github/agents/03-architect.agent.md
Terminal window
# All-in-one status
echo "=== Bicep ===" && bicep --version
echo "=== Terraform ===" && terraform --version
echo "=== TFLint ===" && tflint --version
echo "=== Azure CLI ===" && az version --output table
echo "=== Node ===" && node --version
echo "=== Python ===" && python3 --version
echo "=== Git ===" && git --version
Terminal window
# Validate all artifacts
npm run validate:all
# Bicep validation
bicep lint infra/bicep/{project}/main.bicep
bicep build infra/bicep/{project}/main.bicep
# Terraform validation
cd infra/terraform/{project} && terraform init -backend=false && terraform validate
npm run validate:terraform
# Lint markdown
npm run lint:md
Terminal window
# Current subscription
az account show --output table
# List resource groups
az group list --output table
# Check deployments
az deployment group list -g {resource-group} --output table
  1. Check prompt guide: Prompt Guide has usage examples
  2. Read agent definitions: .github/agents/*.agent.md
  3. Check skill files: .github/skills/*/SKILL.md
  4. Review templates: .github/skills/azure-artifacts/templates/

Use the diagnose agent (🔍 Sentinel):

Ctrl+Shift+A → diagnose
"My bicep-code agent isn't generating valid templates"

Or start the Orchestrator (🧠 Orchestrator) for a guided workflow:

Ctrl+Shift+I → Orchestrator
"Help me troubleshoot my Azure deployment"