Session State Debugging
Diagnose and recover from session resume failures, stale locks, and corrupted state.
Session State Overview
Section titled “Session State Overview”Every workflow run maintains its progress in
agent-output/{project}/00-session-state.json. This file tracks:
- Which steps are complete, in progress, or pending
- Sub-step checkpoints within each step
- Decisions made during the workflow
- Lock/claim ownership for concurrent session safety
The schema version is declared in the schema_version field. The current
claim-based model (v2.0) adds atomic locking to prevent concurrent sessions
from overwriting each other. New state files should use schema_version: "2.0".
The authoritative schema definition is in
.github/skills/session-resume/references/state-file-schema.md.
A human-readable companion file 00-handoff.md summarises the same
state for manual inspection.
Diagnostic Flowchart
Section titled “Diagnostic Flowchart”Use this decision tree when session resume is not working:
flowchart TD
A["Session resume not working?"] --> B{"Does 00-session-state.json exist?"}
B -- No --> C["Fresh start — file will be created automatically"]
B -- Yes --> D{"Is the file valid JSON?"}
D -- No --> E["Corrupted state — see Manual Recovery below"]
D -- Yes --> F{"Check lock.heartbeat — is it stale?"}
F -- Stale --> G["Stale lock — clear it manually or wait for auto-recovery"]
F -- Active --> H{"Is lock.owner_id your session?"}
H -- No --> I["Another session holds the lock — wait or clear manually"]
H -- Yes --> J{"Check steps.N.status"}
J --> K["pending → normal start"]
J --> L["in_progress → resume from sub_step checkpoint"]
J --> M["complete → step already done, move to next"]
Common Problems
Section titled “Common Problems”Corrupted State File
Section titled “Corrupted State File”Symptoms: JSON parse errors, missing required fields, validator failures.
Fix:
-
Run the validator to identify the issue:
Terminal window npm run validate:session-state -
If the file is unrecoverable, check for a backup:
Terminal window ls agent-output/{project}/00-session-state.json.bakIf a
.bakfile exists, restore it:Terminal window cp agent-output/{project}/00-session-state.json.bak agent-output/{project}/00-session-state.json -
If no backup exists, rename the corrupt file and restart:
Terminal window cd agent-output/{project}mv 00-session-state.json 00-session-state.json.corruptThe Conductor creates a fresh v2.0 state file on the next run. All steps reset to
pending.
Stale Lock
Section titled “Stale Lock”Symptoms: “Lock held by another session” error, but no other session is running.
A lock is considered stale when lock.heartbeat has not been updated
within the stale_threshold_ms window (default: 5 minutes).
Fix:
-
Check the heartbeat timestamp:
Terminal window jq '.lock.heartbeat' agent-output/{project}/00-session-state.json -
If the timestamp is older than 5 minutes and no other session is active, clear the lock:
Terminal window jq '.lock = {}' agent-output/{project}/00-session-state.json > tmp.jsonmv tmp.json agent-output/{project}/00-session-state.json -
Resume the workflow — the Conductor will re-acquire the lock.
Missing Steps
Section titled “Missing Steps”Symptoms: Conductor skips a step or reports it as already complete when it was never run.
Fix:
-
Inspect the step status:
Terminal window jq '.steps' agent-output/{project}/00-session-state.json -
Reset the step to
pending:Terminal window jq '.steps."4".status = "pending" | .steps."4".sub_step = null' \agent-output/{project}/00-session-state.json > tmp.jsonmv tmp.json agent-output/{project}/00-session-state.json
Schema Version Mismatch
Section titled “Schema Version Mismatch”Symptoms: Validator warns about unknown fields or missing lock/claim structure.
The v2.0 schema added lock, claim, and attempt_token fields. If
you encounter a v1.0 state file, the Conductor will attempt to upgrade
it automatically. If it fails, manually add the missing fields or
create a fresh state file.
Decision Logging
Section titled “Decision Logging”The decisions object in the session state tracks key choices made
during the workflow:
{ "decisions": { "iac_tool": "bicep", "primary_region": "swedencentral", "complexity": "standard" }}Write decisions at the moment they are made (Step 1 for iac_tool,
Step 2 for architecture choices). The Conductor and downstream agents
read these to route workflow steps correctly.
The decision_log array provides an append-only audit trail:
{ "decision_log": [ { "step": 1, "key": "iac_tool", "value": "bicep", "reason": "Team preference and existing Bicep expertise" } ]}Context Budget Strategy
Section titled “Context Budget Strategy”Each step has a file load budget — a hard limit on how many files the agent loads at startup. This prevents context window exhaustion:
| Step | Budget | Files Loaded |
|---|---|---|
| 1 (Requirements) | 1-2 files | Session state only |
| 2 (Architecture) | 2-3 files | Requirements + session state |
| 4 (Plan) | 2-3 files | Architecture + governance constraints |
| 5 (Code) | 1-2 files | Implementation plan |
Excess files are loaded on demand via progressive disclosure. If resume
is slow, check whether context_files_used in the session state lists
more files than the step’s budget allows.
Validators
Section titled “Validators”Two validators check session state integrity:
# Validate JSON schema compliancenpm run validate:session-state
# Validate lock/claim model integritynpm run validate:session-lockRun these after manual edits to the state file to ensure consistency.