Docs › Skills › Monitor

Monitor

A supervised self-monitoring loop for long-running training runs and multi-step command chains. Clarifies the success criteria first, prints them for the user, then iterates check → debug → verify until done.

Token-Costy by Design

/monitor runs an iterative loop that reads progress, evaluates against a checklist, and can dispatch debuggers each iteration. The skill requires explicit user confirmation before entering the loop. If you just need a single check, use a one-shot inspection instead.

Overview

Monitor turns Claude into a patient supervisor for work that takes time: training runs, hyperparameter sweeps, deploy pipelines, long command chains. It is deliberately not a dumb polling loop. Every run has three phases: clarify, loop, exit.

When It Triggers

Trigger	Details
Manual	"/monitor", "watch this run", "keep an eye on", "monitor training", "run this and make sure it finishes"
Contextual	User hands off a multi-step chain and wants Claude to supervise through completion

Phase 1 — Clarify Requirements

Before any loop runs, the skill extracts a concrete success checklist, hard blockers (terminate), and debuggable soft failures (try to fix). It prints a reference card for the user:

┌─ /monitor plan ─────────────────────────────────────────────────────┐
│ Goal:        <one line>                                 │
│ Success criteria:                                       │
│   [ ] <item 1>                                          │
│   [ ] <item 2>                                          │
│ Hard blockers (terminate loop):                         │
│   • <item>                                              │
│ Debuggable failures (try to fix):                       │
│   • <item>                                              │
│ Budget: up to N iterations, ~M min between checks       │
└───────────────────────────────────────────────────────┘

The user confirms or edits the plan before the loop starts.

Phase 2 — The Loop

Each iteration runs the same four steps, logged to scratch/monitor/{YYYY-MM-DD}-{short-name}/log.md:

Inspect: cheap reads of the authoritative progress source (tail logs, nvidia-smi, squeue, metric endpoint).
Evaluate: mark each checklist item as met / not-yet / regressed.
Decide: exit on success or hard blocker, dispatch a debugger on soft failure, otherwise schedule the next check via ScheduleWakeup.
Check budget: if iterations ≥ max, stop and report status — never silently keep going.

Phase 3 — Exit Protocol

On exit (success, failure, or budget exhaustion), the skill always produces: outcome, final checklist marks, what was done, what's left, and suggested next step. If the run produced learnings, it recommends /retrospective.

Guardrails

Guardrail	Why
No destructive writes inside the loop without per-action approval	The loop has authority to read and diagnose, not to rewrite live infra.
Never claim success just because nothing crashed	Evaluate against the confirmed checklist — silence is not victory.
Renegotiate if criteria turn out to be ill-formed	Don't march toward the wrong goal just because the user wrote it down once.
Keep per-iteration logs to 2–5 lines	The log is for audit, not narrative.