Retrospective
Capture experiment learnings into a reusable skill registry. The failed attempts table is the most valuable artifact.
Overview
The Retrospective skill captures what was learned during a session — what worked, what failed, exact hyperparameters, and key insights — into a structured registry entry that can be searched and surfaced by future investigations.
This is not for mid-session notes (use investigation scratch/ for that). Retrospectives are end-of-session artifacts that preserve knowledge across /clear boundaries.
When It Triggers
| Trigger | Details |
|---|---|
| Manual | "retrospective", "capture what we learned", "save this for next time", "what did we learn", "log this experiment" |
| Auto-suggest | At approximately 20 substantive turns without a retrospective being captured |
The auto-suggestion is a recommendation, not a gate — the user can decline:
"We've been working for a while without capturing learnings.
Consider running a retrospective before /clear — the failed
attempts table is the most valuable artifact, and it won't
survive a context reset."
The Experiment Registry
Registry entries are stored in scratch/registry/{YYYY-MM-DD}-{experiment-short-name}/SKILL.md and follow a strict format:
Setup Section
Each entry records the exact experimental configuration:
- Model: architecture, size, variant
- Dataset: name, size, preprocessing
- Framework: JAX/PyTorch/etc, key libraries and versions
- Hardware: GPUs, memory, distributed setup
- Config: path to config file or inline key parameters
The Failed Attempts Table
This is the most valuable part of any registry entry. It prevents repeating mistakes across /clear boundaries and across team members.
| Attempt | What We Tried | What Happened | Why It Failed |
|---|---|---|---|
| 1 | [specific change] | [specific result] | [root cause] |
| 2 | [specific change] | [specific result] | [root cause] |
Exact values, not vague descriptions. "learning_rate: 3e-4" not "small learning rate". "batch_size=256" not "large batch". The whole point is that someone (or a future Claude session) can reproduce or avoid these exact conditions.
Other Sections
- What Worked: Specific things that worked with exact hyperparameters, copy-paste ready
- Key Hyperparameters: Exact values in a table with the reasoning for each choice
- Findings: Key insights from the session
- Next Steps: What to try next based on results
Registry Structure
The registry lives in scratch/registry/ (gitignored):
scratch/registry/
2026-03-15-rvq-depth-sweep/SKILL.md
2026-03-18-fsq-codebook-size/SKILL.md
2026-03-22-commitment-loss-scaling/SKILL.md
How It Feeds Back
Registry entries are searchable by the investigation skill. When a new investigation starts, matching entries are automatically surfaced based on the description and tags in each entry. This creates a feedback loop:
- Session A runs experiments and captures a retrospective
- Session A ends (or
/clearhappens) - Session B starts a new investigation in a related area
- The investigation skill finds matching registry entries and surfaces them
- Session B avoids repeating Session A's failed attempts
The description field in the registry entry determines when it gets surfaced. Write trigger conditions that are hyper-specific: not "pruning experiments" but "pruning errors on ModelX with ZeRO2". This ensures the entry appears when it is actually relevant.
Key Principles
| Principle | Details |
|---|---|
| Failed attempts > successes | They prevent repeating mistakes across /clear boundaries and across team members. |
| Hyperparameters must be exact | "3e-4" not "small learning rate". "batch_size=256" not "large batch". |
| Trigger conditions must be specific | The description determines when the entry is surfaced. Make it match the exact scenario. |
| Scratch is temporary | The registry lives in scratch/ (gitignored). When patterns are mature, distill them into the main codebase or docs. |