Hints & Tips¶

Coaching approach: These hints use questions to guide your thinking. The best solutions come from understanding why, not just copying what.

Something Broken?¶

If you're stuck on an error rather than a design question, go to Troubleshooting first. The Quick Diagnosis table at the top will route you to the right fix.

Understanding Agent Output Templates¶

📄 How Agent Outputs Work (click to reveal)

Templatized Agent Outputs¶

The agents in this microhack use templates to generate consistent, structured documentation. These templates are located in .github/skills/azure-artifacts/templates/ and include:

.github/skills/azure-artifacts/templates/
├── 01-requirements.template.md
├── 02-architecture-assessment.template.md
├── 03-des-cost-estimate.template.md
├── 04-implementation-plan.template.md
├── 06-deployment-summary.template.md
├── 07-operations-runbook.template.md
└── ... (and more)

Why Templates Matter¶

Deterministic Behavior: Templates make agent outputs more predictable and consistent. The agent fills in the template structure with your specific requirements.
Quality Assurance: Templates ensure all critical sections are covered — the agent won't forget important aspects like security considerations or cost breakdowns.
Professional Standards: The output follows industry best practices for documentation format and content organization.

What This Means for You¶

Expect structured output: Agent responses follow a predictable format
Focus on content, not format: The template handles structure; you focus on requirements
Understand the patterns: Reviewing templates helps you understand what the agent will produce

Exploring Templates¶

Take a moment to browse .github/skills/azure-artifacts/templates/ to understand:

What sections each template includes
What information the agent needs from you to fill them in
How your prompts influence the content (not the structure)

💡 Key insight: GenAI with templates is more predictable than "pure" generation. This is intentional — infrastructure documentation needs consistency!

Architecture Hints¶

💡 Service Selection (click to reveal)

Before asking the architect agent, consider these questions:

Understanding the Requirements:

What are the key capabilities FreshConnect needs? (web portal, API, database, file storage, secrets, monitoring)
How many concurrent users need to be supported at peak?
What's the growth trajectory? (seasonal spikes, planned expansion)

Evaluating Service Options:

For the web portal: What Azure compute services support web hosting?
What are the trade-offs between App Service, Container Apps, and AKS for this workload?
Does the team have container expertise, or would PaaS be more appropriate?
For the database: What data characteristics matter most?
Relational vs. NoSQL — what does the order/customer/inventory data structure suggest?
What availability SLA is required?
How can you optimize costs for dev/test vs. production?
For cost optimization: What resources could share infrastructure?
Could web and API run on the same App Service Plan?
What's the cost difference between separate plans vs. deployment slots?

Prompt the architect agent with business context:

"Design Azure architecture for FreshConnect: farm-to-table delivery platform
serving 500 concurrent users, with order management, inventory tracking,
and delivery scheduling. Budget: €500/month. GDPR compliant (EU region).
Small team needs managed services."

💡 Coaching tip: Services aren't "recommended" — they're chosen based on requirements. What requirements drive your service selection?

💰 Cost Optimization (click to reveal)

Guiding Questions:

Before asking for cost estimates, ask yourself:

Resource Sharing: What services could share infrastructure?
Can web and API applications run on the same App Service Plan?
What's the cost impact of deployment slots vs. separate App Services?
Could you use serverless for intermittent workloads?
Right-Sizing: How do you match SKU to requirements?
What SLA do you actually need? (99.9% vs. 99.95% cost difference?)
What's the minimum tier for zone redundancy?
Could dev/test environments use lower SKUs or serverless?
Cost Discovery: How would you get actual pricing data?
What information does the Azure Pricing MCP need?
How do you compare SKU costs within a service family?
What region affects pricing?

Example Prompt for Azure Pricing MCP:

"Compare costs for App Service plans in swedencentral:
- P1v3 (production)
- S1 (staging)
What features justify the price difference?"

💡 Coaching tip: Cost optimization isn't about picking the cheapest option — it's about matching cost to value. What does each €10/month buy you?

Estimated Budget Breakdown to Discuss:

Consider these categories for your €500/month budget:

Compute (App Service): What percentage?
Data (SQL, Storage): What percentage?
Networking (if any): What percentage?
Observability (App Insights, Log Analytics): What percentage?
Security (Key Vault): What percentage?

Where is most of your budget going? Does that align with business priorities?

🔒 Security & Compliance (click to reveal)

Discovery Questions:

GDPR Compliance:

How would you discover what GDPR requires for customer PII?
What Azure documentation or tools help identify GDPR requirements?
Which Azure regions qualify as "EU data residency"?
What logging is required for audit trails?
What technical controls implement GDPR principles?
How do you ensure data stays in EU region? (service configuration?)
What authentication method protects customer data access?
How do you enable audit trails for compliance teams?

Security Architecture:

What security defaults should every Azure resource have?
HTTPS enforcement? TLS version? Public access settings?
How do you avoid storing secrets in code or templates?
What's the difference between connection strings and managed identities?

Prompt Engineering for Security:

Instead of asking "make it secure," try:

"Review my architecture for GDPR compliance. Data residency must be EU.
Customer PII includes: names, emails, delivery addresses, order history.
Identify gaps and recommend controls."

💡 Coaching tip: Security isn't a checklist — it's about understanding what you're protecting and why. What data does FreshConnect store? What's the impact if it's compromised?

Example Bicep Security Pattern:

// Storage Account - what does each setting protect against?
resource storageAccount 'Microsoft.Storage/storageAccounts@2023-01-01' = {
  properties: {
    supportsHttpsTrafficOnly: true    // Why is HTTP blocked?
    minimumTlsVersion: 'TLS1_2'       // What vulnerability does this mitigate?
    allowBlobPublicAccess: false      // When would you ever want this true?
  }
  identity: {
    type: 'SystemAssigned'            // How does this improve security?
  }
}

Ask yourself: What attack does each setting prevent?

🔒 Governance Policy Errors (click to reveal)

Common Policy Errors & Fixes:

If Azure Policies are enabled, you may see deployment errors like:

Error Message	Cause	Fix
`RequestDisallowedByPolicy` (location)	Resource outside allowed regions	Use `swedencentral` or `germanywestcentral`
`RequestDisallowedByPolicy` (tag)	Missing required tag	Add `Environment` and `Project` tags
`RequestDisallowedByPolicy` (SQL auth)	SQL password auth attempted	Set `azureADOnlyAuthentication: true`
`RequestDisallowedByPolicy` (HTTPS)	HTTPS not enabled	Set `supportsHttpsTrafficOnly: true`
`RequestDisallowedByPolicy` (TLS)	TLS version too low	Set `minimumTlsVersion: 'TLS1_2'`
`RequestDisallowedByPolicy` (public blob)	Public blob access enabled	Set `allowBlobPublicAccess: false`

Required Bicep Settings:

// Storage Account - all required for policy compliance
resource storageAccount 'Microsoft.Storage/storageAccounts@2023-01-01' = {
  name: storageAccountName
  location: location
  tags: tags  // Must include Environment and Project!
  properties: {
    supportsHttpsTrafficOnly: true
    minimumTlsVersion: 'TLS1_2'
    allowBlobPublicAccess: false
  }
}

// SQL Server - Azure AD only
resource sqlServer 'Microsoft.Sql/servers@2023-05-01-preview' = {
  name: sqlServerName
  location: location
  tags: tags
  properties: {
    azureADOnlyAuthentication: true
    administrators: {
      administratorType: 'ActiveDirectory'
      // ... AD admin config
    }
  }
}

// App Service - HTTPS only
resource webApp 'Microsoft.Web/sites@2023-01-01' = {
  name: appName
  location: location
  tags: tags
  properties: {
    httpsOnly: true
    siteConfig: {
      minTlsVersion: '1.2'
    }
  }
}

🏗️ Bicep Patterns (click to reveal)

UniqueString Pattern:

// main.bicep - Generate ONCE, pass everywhere
var uniqueSuffix = uniqueString(resourceGroup().id)

// In modules, receive as parameter
param uniqueSuffix string

// Use in resource names
var kvName = 'kv-${take(projectName, 8)}-${environment}-${take(uniqueSuffix, 6)}'

Naming Constraints:

Resource	Max Length	Allowed Chars
Key Vault	24	alphanumeric, hyphens
Storage Account	24	lowercase, numbers only
SQL Server	63	lowercase, numbers, hyphens

Required Tags:

var tags = {
  Environment: environment      // dev, staging, prod
  ManagedBy: 'Bicep'
  Project: projectName
  Owner: owner
}

🌍 Multi-Region DR (Challenge 4)

Discovery Questions:

When the DR curveball hits, instead of looking for "the answer," ask:

Business Impact: What does "disaster recovery" mean for FreshConnect?
If swedencentral goes down, what business operations must continue?
What's the cost of 1 hour of downtime? 4 hours? 24 hours?
Which data can you afford to lose? (RPO question)
Technical Options: What Azure services support multi-region?
Can App Services fail over automatically, or do you need Traffic Manager?
What's the difference between SQL geo-replication and failover groups?
Does storage need to be in both regions, or can you use GRS?
How do secrets (Key Vault) work across regions?
Cost Trade-offs: What does HA/DR cost?
Budget increases from €500 → €700. What does that extra €200 buy?
Which components are most expensive to replicate?
Could you do active-passive instead of active-active?
Architecture Documentation: How do you communicate the change?
What format best shows before/after architecture?
Should you document this as an ADR (Architecture Decision Record)?
What context does your team need to understand why you chose this approach?

Prompt Engineering for DR:

"Update FreshConnect architecture for disaster recovery:
- Primary: swedencentral
- Secondary: germanywestcentral
- RTO: 4 hours, RPO: 1 hour
- Budget increased to €700/month
Recommend services and configuration changes.
Create ADR documenting decision."

💡 Coaching tip: DR isn't about copying everything twice — it's about identifying what must survive and what recovery time the business can accept.

🔥 Load Testing (Challenge 5)

Understanding Load Testing:

Before running tests, ask:

What are you testing?
Endpoint availability? Response time? Error rate under load?
Are you testing a single API endpoint or the whole application flow?
What's "success"?
What P95 response time is acceptable for users? (2 seconds? 5 seconds?)
What error rate is tolerable? (1%? 0.1%?)
How many concurrent users represent "peak load"?
What does failure tell you?
If response time degrades, what's the bottleneck? (database? compute? network?)
If errors spike, what's failing? (connections? timeouts? application logic?)
What would you change in the architecture to handle more load?

k6 Tool (Installed in Dev Container):

k6 is a modern load testing tool. Basic structure:

import http from "k6/http";
import { check, sleep } from "k6";

export const options = {
  stages: [
    { duration: "1m", target: 100 }, // What does this stage test?
    { duration: "2m", target: 500 }, // What does this stage test?
    { duration: "1m", target: 0 }, // Why ramp down?
  ],
  thresholds: {
    http_req_duration: ["p(95)<2000"], // Why P95? Why 2000ms?
    http_req_failed: ["rate<0.01"], // Why 1%?
  },
};

export default function () {
  const res = http.get("https://your-app.azurewebsites.net/api/health");
  check(res, { "status is 200": (r) => r.status === 200 });
  sleep(1);
}

Analyzing Results:

After running k6, ask:

Did you meet your thresholds? If not, why not?
What Azure Monitor metrics correlate with load test results?
Would scaling up (bigger SKU) or out (more instances) help?

💡 Coaching tip: Load testing isn't pass/fail — it's discovering your system's limits so you can make informed decisions.

📚 Documentation (Challenge 6)

The Documentation Question:

The infrastructure works today. Will your team understand it tomorrow?

Discovery Questions:

Who's the audience?
Operations team troubleshooting at 2 AM?
New developer joining the team?
Compliance auditor asking for DR procedures?
CFO asking why costs increased?
What questions does documentation answer?
"How do I fix it?" → Operational runbook
"How does it work?" → Architecture overview
"What does it cost?" → Cost breakdown
"Is it compliant?" → Security/compliance docs
"How do I deploy changes?" → Deployment guide
What makes documentation useful?
Step-by-step procedures vs. conceptual overviews?
Diagrams vs. text descriptions?
Troubleshooting decision trees?
Links to Azure Portal resources?

Prompt Engineering for Documentation:

"Generate operational runbook for FreshConnect targeted at on-call engineers.

Context:
- Deployed in swedencentral with DR in germanywestcentral
- Using App Service, Azure SQL, Storage Account, Key Vault
- Common scenarios: high latency, connection errors, storage throttling

Include:
- Initial assessment checklist (first 60 seconds)
- Diagnostic steps for common scenarios
- Azure CLI commands for health checks
- Escalation criteria"

💡 Coaching tip: The design agent can generate multiple document types. Which documents provide the most value for FreshConnect's specific needs?

Document Types to Consider:

Operations runbook (troubleshooting)
Architecture documentation (system understanding)
Cost estimate with optimization guide
Disaster recovery procedures
Deployment guide with rollback steps
Security and compliance documentation

🔍 Diagnostics (Challenge 7)

The 2 AM Question:

Your pager goes off. FreshConnect API is slow. Error rate climbing. You have 10 minutes before customers notice.

What do you check first?

Building a Diagnostic Strategy:

What are the likely failure modes?
Database connection pool exhausted?
App Service out of memory?
Storage account throttling?
Network connectivity to dependencies?
External API timeout (payment gateway?)?
What's the diagnostic sequence?
Start with application health endpoint or infrastructure metrics?
Check current state or compare to historical baseline?
Look at logs or metrics first?
What tools exist in Azure?
Azure Portal health dashboards
Application Insights queries (Kusto/KQL)
Azure Monitor metrics and alerts
Kudu console for App Service diagnostics
Azure CLI diagnostic commands
Quick fix vs. escalation?
What can on-call engineer safely restart?
When do you wake up the architect?
What changes need approval vs. immediate action?

Prompt for Diagnostic Runbook:

"Create troubleshooting runbook for FreshConnect production incidents.

Scenarios:
- High API latency (P95 > 5 seconds)
- Database connection errors
- Storage 503 errors (throttling)

For each scenario provide:
1. Likely root causes
2. Diagnostic commands (Azure CLI, KQL queries)
3. Remediation steps
4. Escalation criteria"

💡 Coaching tip: Good diagnostic documentation includes the why not just the what. Why check database DTU before App Service CPU? What's the reasoning?

Example Diagnostic Flow:

1. Initial Assessment (60 seconds)
   → Check Azure Status page (is it a platform issue?)
   → Check Application Insights overview (which component is failing?)
   → Check recent deployments (did something change?)

2. If High Latency Detected
   → Query: What's the P95 latency by dependency?
   → Check: Database DTU utilization
   → Check: App Service CPU/Memory
   → Decision: Scale up, scale out, or investigate query?

3. If Connection Errors
   → Check: Connection string configuration
   → Check: Managed identity permissions
   → Check: Network security groups / firewall rules

Common Mistakes to Avoid¶

Mistake	Solution
Key Vault name too long	Use `kv-${take(name, 8)}-${env}-${take(suffix, 6)}`
Storage account with hyphens	Use lowercase letters and numbers only
Missing uniqueSuffix	Generate once in main.bicep, pass to all modules
Hardcoded secrets	Use Key Vault references or managed identity
Over-engineering MVP	Keep it simple — you have one day.
Forgetting to deploy	Run `bicep build` often, deploy incrementally

Agent-Specific Tips¶

Requirements Agent (Challenge 1)¶

Instead of asking "what should I do?", ask:

"What NFRs should I capture for a farm-to-table delivery platform?"
"How do I translate 'peak season = 3x volume' into technical requirements?"
"What questions help uncover hidden requirements?"

💡 Be specific about business context, not just technical features.

Architect Agent (Challenge 2)¶

Instead of "design my architecture", try:

"What are the trade-offs between App Service and Container Apps for this workload?"
"How does the €500/month budget constraint affect service selection?"
"What WAF pillar is most at risk with this approach?"

💡 Question recommendations — ask "why this service?" not just "what service?"

Bicep Plan Agent (Challenge 3)¶

Instead of "write Bicep", ask:

"What's the dependency order for deploying these resources?"
"Should Key Vault be in a separate module or main.bicep?"
"How do I structure modules for reusability?"

💡 Review the module structure before generating code.

Bicep Code Agent (Challenge 3)¶

Instead of "generate all the code", try:

"Generate Key Vault module with name validation and uniqueSuffix parameter"
Start with one module, validate it works, then expand
Run bicep build after each major change

💡 Iterate incrementally — don't generate everything at once.

Design Agent (Challenges 5-7)¶

Instead of "document everything", ask:

"Who is the audience for this documentation?"
"What specific problem does this document solve?"
"Generate [document type] for [audience] covering [scenarios]"

💡 Good documentation answers questions before they're asked.

Still Stuck?¶

Ask yourself: "What question would help me discover the answer?"

If still blocked, raise your hand — facilitators are here to coach, not solve! 🙋

Remember: This microhack has 8 challenges total, not all will be completed by all teams. Focus on learning the workflow and prompt engineering skills!