Lab 3: Edge RAG Setup
π§ Lab Under Development
This lab content is complete but hands-on exercises are currently being validated and refined.
Expected Release: Q1 2026
You can review the lab steps and prepare your environment in advance.
π§ Lab Under Development
This lab content is complete but hands-on exercises are currently being validated and refined.
Expected Release: Q1 2026
You can review the lab steps and prepare your environment in advance.
π§ Lab Under Development
This lab content is complete but hands-on exercises are currently being validated and refined.
Expected Release: Q1 2026
You can review the lab steps and prepare your environment in advance.
Objective
Deploy a complete Edge RAG (Retrieval-Augmented Generation) solution on Azure Local, including vector database, embedding models, LLM inference engine, and RAG pipeline. This is the most comprehensive lab demonstrating AI at the edge.
Pre-Lab Checklist
PREREQUISITES
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Required:
β Completion of Lab 1 (Azure Local) and Lab 2 (Arc)
β Azure subscription with resources from prior labs
β 8+ GB RAM available for containers
β 50+ GB disk space for models
β Docker/Podman installed locally
β Python 3.10+ (for RAG script)
β curl or Postman for API testing
Optional but Recommended:
β GPU (NVIDIA/AMD) for model acceleration
β LLM model experience
β Vector database knowledge (Weaviate/Qdrant)
β REST API debugging tools
Estimated Time: 3-4 hours
Difficulty: Advanced
Cost: $50-100 Azure credits (GPU usage)
Lab Architecture
EDGE RAG SYSTEM ARCHITECTURE
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Azure Local (On-Premises)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Edge RAG Solution                                       β
β                                                         β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β RAG Application Layer                              β β
β β ββ FastAPI/Flask RAG Endpoint (:8000)             β β
β β ββ Document Ingestion Service                      β β
β β ββ Query Processing Pipeline                       β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β          β           β           β                      β
β βββββββββββββββ¬ββββββββββββββββ¬βββββββββββββββ        β
β β Embedding   β Vector Store  β LLM Inferenceβ        β
β β Model       β (Weaviate)    β Engine       β        β
β β (LLaMA-    β (:8080)       β (Ollama:11434)       β
β β Embeddings) β               β              β        β
β βββββββββββββββ΄ββββββββββββββββ΄βββββββββββββββ        β
β                                                         β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Storage & Persistence                              β β
β β ββ Volume: /data/weaviate (vector store)          β β
β β ββ Volume: /data/ollama (model cache)             β β
β β ββ Volume: /data/documents (ingested docs)        β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β                                                         β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Monitoring & Logging                               β β
β β ββ Prometheus metrics (:9090)                      β β
β β ββ Loki logs aggregation (:3100)                   β β
β β ββ Grafana dashboards (:3000)                      β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
          β (Arc Integration)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Azure (Monitoring & Backup)                             β
β ββ Azure Monitor ingests metrics                        β
β ββ Log Analytics receives logs                          β
β ββ Storage Account backs up embeddings                  β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Lab Steps
Step 1: Prepare Edge RAG Environment
Objective: Set up prerequisites and namespace for RAG system
Step 1.1: Create Namespace
# Create dedicated namespace for RAG
kubectl create namespace edge-rag
# Label namespace for monitoring
kubectl label namespace edge-rag monitoring=enabled
# Verify namespace
kubectl get namespace edge-rag
Expected Output: Namespace βedge-ragβ created
Step 1.2: Create Storage for Models and Data
# Create PVC for persistent storage
@"
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rag-data-pvc
  namespace: edge-rag
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: weaviate-pvc
  namespace: edge-rag
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 30Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ollama-pvc
  namespace: edge-rag
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
"@ | kubectl apply -f -
# Verify PVCs
kubectl get pvc -n edge-rag
Expected Output: Three PVCs created and bound
Step 1.3: Create ConfigMap for RAG Configuration
# Create configuration for RAG pipeline
@"
apiVersion: v1
kind: ConfigMap
metadata:
  name: rag-config
  namespace: edge-rag
data:
  rag-settings.yaml: |
    vector_store:
      type: weaviate
      url: http://weaviate:8080
      batch_size: 50
      consistency_level: ALL
    
    embeddings:
      model: sentence-transformers/all-MiniLM-L6-v2
      device: cpu
      batch_size: 32
    
    llm:
      engine: ollama
      url: http://ollama:11434
      model: mistral
      temperature: 0.7
      max_tokens: 512
    
    retrieval:
      top_k: 5
      similarity_threshold: 0.7
      
    ingestion:
      chunk_size: 512
      chunk_overlap: 50
      document_path: /data/documents
"@ | kubectl apply -f -
# Verify ConfigMap
kubectl get configmap -n edge-rag
Expected Output: ConfigMap βrag-configβ created
Step 2: Deploy Vector Database (Weaviate)
Objective: Set up Weaviate vector database for embedding storage
Step 2.1: Deploy Weaviate Service
# Deploy Weaviate vector database
@"
apiVersion: apps/v1
kind: Deployment
metadata:
  name: weaviate
  namespace: edge-rag
spec:
  replicas: 1
  selector:
    matchLabels:
      app: weaviate
  template:
    metadata:
      labels:
        app: weaviate
    spec:
      containers:
      - name: weaviate
        image: semitechnologies/weaviate:1.18.0
        ports:
        - containerPort: 8080
          name: graphql
        - containerPort: 50051
          name: grpc
        env:
        - name: AUTHENTICATION_APIKEY_ENABLED
          value: "false"
        - name: PERSISTENCE_DATA_PATH
          value: /var/lib/weaviate
        - name: ENABLE_MODULES
          value: "text2vec-transformers,text2vec-openai"
        resources:
          requests:
            memory: "2Gi"
            cpu: "1000m"
          limits:
            memory: "4Gi"
            cpu: "2000m"
        livenessProbe:
          httpGet:
            path: /v1/.well-known/ready
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /v1/.well-known/ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
        volumeMounts:
        - name: weaviate-storage
          mountPath: /var/lib/weaviate
      volumes:
      - name: weaviate-storage
        persistentVolumeClaim:
          claimName: weaviate-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: weaviate
  namespace: edge-rag
spec:
  type: ClusterIP
  ports:
  - port: 8080
    targetPort: 8080
    name: graphql
  - port: 50051
    targetPort: 50051
    name: grpc
  selector:
    app: weaviate
"@ | kubectl apply -f -
# Wait for deployment
Write-Host "Weaviate deploying (2-3 minutes)..."
kubectl wait --for=condition=ready pod -l app=weaviate -n edge-rag --timeout=300s
# Verify service
kubectl get svc -n edge-rag
kubectl get pods -n edge-rag
Expected Output: Weaviate pod running, service created
Step 2.2: Verify Weaviate Health
# Port-forward to test locally (optional)
# kubectl port-forward -n edge-rag svc/weaviate 8080:8080 &
# Get Weaviate pod IP for testing
$weaviatePod = kubectl get pods -n edge-rag -l app=weaviate -o jsonpath='{.items[0].metadata.name}'
$weaviateIP = kubectl get pod $weaviatePod -n edge-rag -o jsonpath='{.status.podIP}'
Write-Host "Weaviate Pod: $weaviatePod"
Write-Host "Weaviate IP: $weaviateIP"
# Test connectivity from another pod
kubectl run -it --rm debug --image=curlimages/curl -n edge-rag -- sh
# Inside pod: curl http://weaviate:8080/v1/.well-known/ready
Expected Output: Weaviate is ready and accessible
Step 3: Deploy LLM Inference Engine (Ollama)
Objective: Set up Ollama for local LLM inference
Step 3.1: Deploy Ollama Service
# Deploy Ollama for LLM inference
@"
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
  namespace: edge-rag
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:latest
        ports:
        - containerPort: 11434
          name: api
        env:
        - name: OLLAMA_MODELS_DIR
          value: /root/.ollama/models
        resources:
          requests:
            memory: "4Gi"
            cpu: "2000m"
          limits:
            memory: "8Gi"
            cpu: "4000m"
        livenessProbe:
          exec:
            command: ["sh", "-c", "curl -f http://localhost:11434/api/tags || exit 1"]
          initialDelaySeconds: 60
          periodSeconds: 30
        readinessProbe:
          exec:
            command: ["sh", "-c", "curl -f http://localhost:11434/api/tags || exit 1"]
          initialDelaySeconds: 30
          periodSeconds: 10
        volumeMounts:
        - name: ollama-storage
          mountPath: /root/.ollama
      volumes:
      - name: ollama-storage
        persistentVolumeClaim:
          claimName: ollama-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: ollama
  namespace: edge-rag
spec:
  type: ClusterIP
  ports:
  - port: 11434
    targetPort: 11434
    name: api
  selector:
    app: ollama
"@ | kubectl apply -f -
# Wait for deployment
Write-Host "Ollama deploying (1-2 minutes)..."
kubectl wait --for=condition=ready pod -l app=ollama -n edge-rag --timeout=300s
Expected Output: Ollama pod running, service created
Step 3.2: Pull and Verify Model
# Get Ollama pod name
$ollamaPod = kubectl get pods -n edge-rag -l app=ollama -o jsonpath='{.items[0].metadata.name}'
# Pull lightweight model (Mistral 7B)
# Note: First time takes 5-10 minutes for download
Write-Host "Pulling Mistral model (this may take several minutes)..."
kubectl exec -it $ollamaPod -n edge-rag -- ollama pull mistral
# Verify model is available
kubectl exec $ollamaPod -n edge-rag -- ollama list
# Test model responsiveness
kubectl exec $ollamaPod -n edge-rag -- ollama generate --model mistral "Hello, what is retrieval augmented generation?" | head -20
Expected Output: Model pulled and responding to queries
Step 4: Deploy RAG Application
Objective: Deploy the RAG pipeline connecting embeddings, vector store, and LLM
Step 4.1: Create RAG Application Image (Local Build)
# Create Dockerfile for RAG application
$dockerfile = @"
FROM python:3.10-slim
WORKDIR /app
# Install dependencies
RUN pip install --no-cache-dir \
    fastapi==0.104.1 \
    uvicorn==0.24.0 \
    requests==2.31.0 \
    weaviate-client==3.25.0 \
    sentence-transformers==2.2.2 \
    torch==2.1.0 \
    PyYAML==6.0 \
    pydantic==2.5.0
# Copy RAG application
COPY app.py /app/
COPY rag_pipeline.py /app/
COPY config.yaml /app/
EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
"@
$dockerfile | Out-File -Path Dockerfile
Write-Host "Dockerfile created"
Step 4.2: Create RAG Pipeline Code
# Save as rag_pipeline.py
cat > rag_pipeline.py << 'EOF'
import requests
import json
from typing import List, Dict
import logging
logger = logging.getLogger(__name__)
class RAGPipeline:
    def __init__(self, config: Dict):
        self.config = config
        self.weaviate_url = config['vector_store']['url']
        self.ollama_url = config['llm']['url']
        self.top_k = config['retrieval']['top_k']
        
    def embed_text(self, text: str) -> List[float]:
        """Generate embeddings for text"""
        try:
            # Use sentence-transformers locally
            from sentence_transformers import SentenceTransformer
            model = SentenceTransformer('all-MiniLM-L6-v2')
            embedding = model.encode(text, convert_to_tensor=True)
            return embedding.tolist()
        except Exception as e:
            logger.error(f"Embedding error: {e}")
            raise
    
    def store_document(self, doc_id: str, text: str, metadata: Dict) -> bool:
        """Store document in Weaviate"""
        try:
            embedding = self.embed_text(text)
            
            payload = {
                "class": "Document",
                "id": doc_id,
                "properties": {
                    "content": text,
                    "source": metadata.get("source", "unknown"),
                    "timestamp": metadata.get("timestamp", ""),
                },
                "vector": embedding
            }
            
            response = requests.post(
                f"{self.weaviate_url}/v1/objects",
                json=payload
            )
            return response.status_code == 200
        except Exception as e:
            logger.error(f"Storage error: {e}")
            raise
    
    def retrieve_context(self, query: str) -> List[str]:
        """Retrieve relevant documents for query"""
        try:
            query_embedding = self.embed_text(query)
            
            graphql_query = f"""
            {{
              Get {{
                Document(
                  nearVector: {{
                    vector: {query_embedding}
                  }}
                  limit: {self.top_k}
                ) {{
                  content
                  source
                }}
              }}
            }}
            """
            
            response = requests.post(
                f"{self.weaviate_url}/v1/graphql",
                json={"query": graphql_query}
            )
            
            if response.status_code == 200:
                results = response.json().get("data", {}).get("Get", {}).get("Document", [])
                return [doc["content"] for doc in results]
            return []
        except Exception as e:
            logger.error(f"Retrieval error: {e}")
            raise
    
    def generate_answer(self, query: str, context: List[str]) -> str:
        """Generate answer using LLM with context"""
        try:
            context_text = "\n".join(context)
            prompt = f"""Context:
{context_text}
Question: {query}
Answer:"""
            
            response = requests.post(
                f"{self.ollama_url}/api/generate",
                json={
                    "model": "mistral",
                    "prompt": prompt,
                    "stream": False
                }
            )
            
            if response.status_code == 200:
                return response.json()["response"]
            raise Exception(f"Generation error: {response.status_code}")
        except Exception as e:
            logger.error(f"Generation error: {e}")
            raise
    
    def query(self, query: str) -> Dict:
        """Full RAG query pipeline"""
        try:
            context = self.retrieve_context(query)
            answer = self.generate_answer(query, context)
            return {
                "query": query,
                "answer": answer,
                "context_documents": len(context),
                "sources": [doc[:100] + "..." for doc in context]
            }
        except Exception as e:
            logger.error(f"Query error: {e}")
            return {
                "query": query,
                "error": str(e),
                "answer": "Unable to generate answer"
            }
EOF
Step 4.3: Create FastAPI Application
# Save as app.py
cat > app.py << 'EOF'
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import yaml
import logging
from rag_pipeline import RAGPipeline
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
app = FastAPI(title="Edge RAG Service", version="1.0.0")
# Load configuration
with open("config.yaml", "r") as f:
    config = yaml.safe_load(f)
# Initialize RAG pipeline
rag_pipeline = RAGPipeline(config)
class QueryRequest(BaseModel):
    query: str
class DocumentRequest(BaseModel):
    doc_id: str
    content: str
    source: str = "unknown"
@app.get("/health")
async def health_check():
    return {"status": "healthy"}
@app.get("/config")
async def get_config():
    return config
@app.post("/query")
async def query_endpoint(request: QueryRequest):
    try:
        result = rag_pipeline.query(request.query)
        return result
    except Exception as e:
        logger.error(f"Query failed: {e}")
        raise HTTPException(status_code=500, detail=str(e))
@app.post("/ingest")
async def ingest_document(request: DocumentRequest):
    try:
        success = rag_pipeline.store_document(
            request.doc_id,
            request.content,
            {"source": request.source}
        )
        return {"success": success, "doc_id": request.doc_id}
    except Exception as e:
        logger.error(f"Ingestion failed: {e}")
        raise HTTPException(status_code=500, detail=str(e))
@app.get("/stats")
async def get_stats():
    return {
        "vector_store": config['vector_store']['url'],
        "llm_model": config['llm']['model'],
        "embeddings_model": config['embeddings']['model'],
        "retrieval_top_k": config['retrieval']['top_k']
    }
if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)
EOF
Step 4.4: Deploy RAG Service
# Deploy RAG application
@"
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rag-api
  namespace: edge-rag
spec:
  replicas: 2
  selector:
    matchLabels:
      app: rag-api
  template:
    metadata:
      labels:
        app: rag-api
    spec:
      containers:
      - name: rag-api
        image: rag-service:latest
        imagePullPolicy: Never
        ports:
        - containerPort: 8000
          name: http
        env:
        - name: WEAVIATE_URL
          value: http://weaviate:8080
        - name: OLLAMA_URL
          value: http://ollama:11434
        resources:
          requests:
            memory: "2Gi"
            cpu: "1000m"
          limits:
            memory: "4Gi"
            cpu: "2000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 5
        volumeMounts:
        - name: rag-data
          mountPath: /data
      volumes:
      - name: rag-data
        persistentVolumeClaim:
          claimName: rag-data-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: rag-api
  namespace: edge-rag
spec:
  type: LoadBalancer
  ports:
  - port: 8000
    targetPort: 8000
    name: http
  selector:
    app: rag-api
"@ | kubectl apply -f -
# Wait for deployment
Write-Host "RAG API deploying..."
kubectl wait --for=condition=ready pod -l app=rag-api -n edge-rag --timeout=300s
Expected Output: RAG API pods running
Step 5: Test RAG Pipeline
Objective: Validate end-to-end RAG functionality
Step 5.1: Ingest Sample Documents
# Get RAG API service IP
$ragApiIP = kubectl get service rag-api -n edge-rag -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
Write-Host "RAG API available at: http://$ragApiIP:8000"
# Create sample documents
$doc1 = @{
    doc_id = "doc-001"
    content = "Azure Local is Microsoft's edge computing platform for sovereign cloud deployments. It enables organizations to run cloud services on-premises with guaranteed data residency and compliance."
    source = "Azure Local Overview"
}
$doc2 = @{
    doc_id = "doc-002"
    content = "Retrieval-Augmented Generation (RAG) combines the power of large language models with targeted document retrieval. This approach improves accuracy and reduces hallucinations by grounding responses in actual data."
    source = "RAG Fundamentals"
}
$doc3 = @{
    doc_id = "doc-003"
    content = "Azure Arc enables unified management of resources across on-premises, edge, and cloud environments. It provides policy enforcement, monitoring, and governance at scale for hybrid infrastructure."
    source = "Azure Arc Overview"
}
# Ingest documents
foreach ($doc in @($doc1, $doc2, $doc3)) {
    $response = Invoke-RestMethod -Uri "http://$ragApiIP:8000/ingest" `
        -Method Post `
        -ContentType "application/json" `
        -Body ($doc | ConvertTo-Json)
    
    Write-Host "Ingested: $($doc.doc_id) - $($response.success)"
}
Expected Output: Documents ingested successfully
Step 5.2: Query RAG System
# Test RAG queries
$queries = @(
    "What is Azure Local?",
    "How does RAG work?",
    "Tell me about Azure Arc"
)
foreach ($query in $queries) {
    Write-Host "`nQuery: $query"
    Write-Host "β" * 60
    
    $response = Invoke-RestMethod -Uri "http://$ragApiIP:8000/query" `
        -Method Post `
        -ContentType "application/json" `
        -Body (@{ query = $query } | ConvertTo-Json)
    
    Write-Host "Answer: $($response.answer)"
    Write-Host "Sources: $($response.context_documents) documents"
}
Expected Output: RAG system returning contextual answers
Step 5.3: Monitor Performance
# Check pod logs
kubectl logs -n edge-rag -l app=rag-api --tail=50
# Monitor resource usage
kubectl top nodes
kubectl top pods -n edge-rag
# Get RAG API stats
$stats = Invoke-RestMethod -Uri "http://$ragApiIP:8000/stats" -Method Get
Write-Host "RAG System Configuration:"
Write-Host ($stats | ConvertTo-Json -Depth 3)
Expected Output: All services healthy with reasonable resource usage
Step 6: Configure Monitoring
Objective: Set up observability for RAG system
Step 6.1: Add Prometheus Metrics
# Deploy Prometheus for metrics collection
@"
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: edge-rag
data:
  prometheus.yml: |
    global:
      scrape_interval: 30s
    scrape_configs:
    - job_name: 'rag-api'
      static_configs:
      - targets: ['rag-api:8000']
    - job_name: 'weaviate'
      static_configs:
      - targets: ['weaviate:8080']
    - job_name: 'kubernetes'
      kubernetes_sd_configs:
      - role: pod
        namespaces:
          names:
          - edge-rag
"@ | kubectl apply -f -
Write-Host "Prometheus ConfigMap created"
Step 6.2: Deploy Observability Stack
# Deploy Prometheus, Loki, and Grafana
Write-Host "In production, use Helm for full monitoring stack"
Write-Host "For this lab, monitoring is simplified via Kubernetes metrics"
# Verify metrics are available
kubectl top pods -n edge-rag
Step 7: Validation and Performance Testing
Objective: Verify RAG system meets performance requirements
Step 7.1: Load Testing
# Simple performance test
$queries = @(
    "What is sovereign cloud?",
    "Explain data residency",
    "What is edge computing?",
    "How does AI inference work?",
    "What is vector similarity?"
)
Write-Host "Running performance test..."
Write-Host "β" * 60
$results = @()
foreach ($query in $queries) {
    $start = Get-Date
    
    $response = Invoke-RestMethod -Uri "http://$ragApiIP:8000/query" `
        -Method Post `
        -ContentType "application/json" `
        -Body (@{ query = $query } | ConvertTo-Json)
    
    $duration = ((Get-Date) - $start).TotalMilliseconds
    
    $results += [PSCustomObject]@{
        Query = $query
        DurationMs = [math]::Round($duration, 2)
        Success = -not $response.error
    }
}
# Display results
$results | Format-Table -AutoSize
$avgTime = ($results.DurationMs | Measure-Object -Average).Average
Write-Host "`nAverage Response Time: $([math]::Round($avgTime, 2))ms"
Expected Output: Response times under 5 seconds, high success rate
Step 7.2: Resource Efficiency Check
# Check resource efficiency
$podMetrics = kubectl top pods -n edge-rag
Write-Host "Resource Usage Summary:"
Write-Host "β" * 60
$podMetrics | ForEach-Object {
    $cpu = $_.CPU
    $mem = $_.MEMORY
    Write-Host "$($_.NAME): CPU=$cpu, Memory=$mem"
}
# Check storage usage
kubectl exec -it $(kubectl get pods -n edge-rag -l app=rag-api -o jsonpath='{.items[0].metadata.name}') -n edge-rag -- df -h /data
Expected Output: Efficient resource utilization
Step 8: Next Steps and Scaling
Objective: Plan for production deployment
Step 8.1: Document System Capacity
# Get current deployment info
Write-Host "Current RAG System Configuration:"
Write-Host "β" * 60
$deploymentInfo = kubectl get deployment -n edge-rag -o jsonpath='{.items[*].spec.replicas}' | Measure-Object -Sum
Write-Host "Total Replicas: $($deploymentInfo.Sum)"
$serviceInfo = kubectl get service -n edge-rag
Write-Host "Services: $($serviceInfo.Count - 1)"
$pvcInfo = kubectl get pvc -n edge-rag
Write-Host "Storage Allocated: $(($pvcInfo | Measure-Object).Count) PVCs"
Write-Host "`nScaling Recommendations:"
Write-Host "- RAG API: Current 2 replicas, can scale to 5+"
Write-Host "- Weaviate: Requires persistent storage, single instance optimal"
Write-Host "- Ollama: Consider GPU-enabled node for better performance"
Step 8.2: Export Configuration for Lab 4
# Export current RAG setup for reference in Lab 4
kubectl get all -n edge-rag -o yaml > edge-rag-backup.yaml
Write-Host "Configuration exported to edge-rag-backup.yaml"
Write-Host "This will be referenced in Lab 4 for policy governance"
Learning Outcomes
What You Learned
β Edge RAG architecture and components β Vector database deployment (Weaviate) β LLM inference at the edge (Ollama) β Embedding generation and vector search β RAG pipeline implementation β API endpoint design for ML workloads β Performance monitoring for AI applications β Resource optimization for inference
Skills Gained
β Deploy production-grade vector databases β Configure local LLM inference engines β Build RAG applications with Python/FastAPI β Manage AI model lifecycle at the edge β Monitor and optimize ML workload performance β Design scalable inference architectures β Integrate AI with existing infrastructure
Knowledge Applied From Previous Modules
β Module 1 (Azure Local): Deployed on Azure Local compute β Module 2 (Arc): Integrated with Arc management in Lab 2 β Module 3 (Edge RAG): Core content for this lab
Troubleshooting
| Issue | Solution | 
|---|---|
| Ollama model pull timeout | Increase timeout or use smaller model (tinyllama) | 
| Weaviate connection errors | Check Pod IP: kubectl get pods -n edge-rag -o wide | 
    
| RAG API pods crashing | Check logs: kubectl logs <pod> -n edge-rag | 
    
| Out of memory errors | Reduce model size or increase Pod limits | 
| Embedding generation slow | Consider GPU or batch processing | 
| Vector search returning no results | Verify documents were ingested: check Weaviate logs | 
Last Updated: October 21, 2025