Deep Research

Structured literature survey with human checkpoints. Understand the landscape before committing to an approach.

Overview

The Deep Research skill conducts a structured literature review or technical survey with human checkpoints between phases. It activates when you need to understand the state of a technique, compare existing approaches, or survey what methods exist for a problem — before jumping into implementation.

Unlike the Investigation skill (which traces code), Deep Research looks outward: papers, repos, known techniques, and documented failure modes. The two skills often work together — Investigation maps the codebase, Deep Research maps the literature.

When to Use

Triggers on: "survey", "literature review", "what's the state of", "what approaches exist for", "compare methods for", "find papers on", "what has been tried for". For reading a single specific paper, use /read-paper instead.

The Four Phases

Deep Research follows a strict four-phase pipeline. Each phase has a human checkpoint before proceeding to the next.

Phase 1: Scope Definition

Before searching anything, clarify what you are looking for:

  1. Create scratch/research/{YYYY-MM-DD}-{topic}/README.md with:
    • Research question: What specifically are we trying to learn?
    • Scope boundaries: What is in scope and out of scope?
    • Success criteria: What do we need to know to make a decision?
  2. Propose a research outline — the key subtopics to cover.
  3. Checkpoint with user: Get approval on the outline before proceeding. The user may want to narrow or expand scope.
## Research Outline: VQ-VAE variants for action prediction
1. Original VQ-VAE and VQ-VAE-2 — baseline understanding
2. Discrete representations in robotics — who has used this and for what?
3. Codebook collapse solutions — what works?
4. Alternatives to VQ (FSQ, LFQ, RVQ) — trade-offs?
5. Integration with trajectory prediction — any existing work?

Phase 2: Systematic Search

For each subtopic in the approved outline:

  1. Search using web search, arxiv, and any available tools.
  2. For each relevant paper or resource found, extract:
    • Citation: Authors, title, year, venue
    • Key idea: One paragraph summary
    • Method: How it works (architecture, loss, training procedure)
    • Results: Main quantitative results and claims
    • Relevance: Why this matters for the specific question
    • Limitations: What doesn't work or isn't addressed
  3. Update the README.md with findings as you go — don't wait until the end.
Search Strategy

Breadth first, then depth. Survey the landscape before diving deep into any single paper. Don't get stuck on the first promising result. Be critical of claims — papers overstate results. Look for ablations, failure cases, and what is NOT reported.

Phase 3: Synthesis

After covering all subtopics, synthesize findings into a structured format:

  1. Comparison table: Methods side-by-side on key dimensions (method, year, key idea, pros, cons, relevance).
  2. Taxonomy: How the approaches relate. What are the major families or paradigms?
  3. Gaps: What hasn't been tried? Where is there opportunity?
  4. Recommendation: Given the specific use case, which approach(es) to try first and why.
  5. References: Full list with links.

Phase 4: User Review

Checkpoint with user: Present the synthesis. The user may want to dig deeper into a specific method, challenge the recommendation, add methods they know about, or refine the research question based on findings.

Output Structure

All research artifacts live in scratch/ (gitignored):

scratch/research/{YYYY-MM-DD}-{topic}/
  README.md           # Main synthesis document
  papers/             # Per-paper detailed notes (if needed)
    vqvae-original.md
    fsq-2023.md
  comparison.md       # Detailed comparison table (if too large for README)

Key Principles

PrincipleDetails
Breadth firstSurvey the landscape before diving deep into any single paper.
Be critical of claimsPapers overstate results. Look for ablations, failure cases, and what is NOT reported.
Track reproducibilityA paper with no code and vague hyperparameters is less useful than one with a working repo.
Date mattersA 2020 method may be superseded. Always check for more recent work that builds on it.
Context mattersRank everything by relevance to the actual problem, not by general impressiveness.

Connection to Investigation

Deep Research and the Investigation skill are complementary. Investigation maps the internal codebase — what exists, how it connects, where the gaps are. Deep Research maps the external landscape — what has been tried, what works, what the literature recommends.

A typical workflow starts with Investigation to understand the current state of the code, then uses Deep Research to find better approaches in the literature, and finally distills actionable findings from the research back into the investigation README when ready to implement.