Common Pitfalls

Known failure modes when working with Claude — and how Propel's pipeline prevents each one.

Pitfall 1: Unconstrained Implementation

The Fundamental Problem

Giving Claude a vague, open-ended implementation request leads to plausible-looking but fundamentally wrong code. This is the single most common failure mode and the primary reason Propel's gate system exists.

What Goes Wrong

When Claude receives an unconstrained problem like "I want to build a can transport task with robosuite", three things happen:

  1. Gaps are filled with training-data averages — Claude picks the most likely architecture, reward structure, and API usage based on what it has seen, not what your project needs.
  2. Confident but arbitrary choices are made silently — Claude won't tell you it's guessing about the reward function, the observation space, or the controller interface.
  3. The code compiles but is subtly wrong — the implementation looks reasonable and passes cursory review, but embeds wrong assumptions that surface much later (e.g., during training).

Why It Happens

Claude is fundamentally a pattern-matching system. When given a specific example to morph ("take PickPlace and add a transport phase"), it produces excellent results because the constraints are tight. When given an open-ended request ("build a can transport task"), it has to fill in every design decision from its training distribution — and the mean of all possible implementations is rarely the correct one for your specific use case.

Key Insight

Claude is great at: looking at one thing and morphing it into something you want.
Claude is bad at: creating something new from scratch when the problem is unconstrained.

Symptoms

How Propel Prevents This

The Questioner Gates

The Questioner checkpoints (Q0 and Q1) in the Propel pipeline exist specifically to prevent unconstrained implementation. The cost of 5 minutes of scoping questions is negligible. The cost of an unconstrained implementation that compiles but trains wrong is hours of debugging and wasted compute.

Q0 (before investigation) forces the user to provide:

Q1 (before design) forces the user to specify:

If the user cannot provide any reference implementation, Q0 flags this as a high-risk unconstrained implementation so that investigation is extra thorough.

Example: Bad vs. Good

Bad (unconstrained):
User:  Build a can transport task with robosuite.
Claude: [produces 300 lines of code with arbitrary choices
        for reward, obs space, controller, etc.]

Good (constrained with Q0):
User:  Build a can transport task with robosuite.
Claude: [Q0] Is there an existing task I should use as a
        starting point?
User:  Yes, look at PickPlace in robosuite/environments/manipulation/
Claude: [Q0] What architecture should I follow — same reward
        phases as PickPlace?
User:  Same structure, but add a transport phase between
        pick and place.
Claude: [Q0] What benchmark should I verify against?
User:  PickPlace test suite as baseline.
[Claude now has concrete anchors and produces correct code]

Pitfall 2: Symptom Patching

Masking the Real Problem

Applying a surface-level fix that suppresses a symptom without addressing the root cause. The problem appears "fixed" but the underlying bug remains, often manifesting in a different and harder-to-debug way later.

What It Looks Like

SymptomPatch (Wrong)Actual Fix
Loss goes to NaNAdd torch.clamp() to prevent NaNTrace the NaN to its source — the clamp masks the real problem
Test fails intermittentlyAdd @retry(3) decoratorFind the race condition or state leak causing flakiness
Shape mismatch errorAdd a .reshape() at the error siteFix the upstream operation that produces the wrong shape

How Propel Prevents This

Debugger Mode's Diagnosis-First Rule

Debugger Mode enforces diagnosis before fixing. Gate 4 requires presenting the root cause, the mechanism, proposed fix, side effects, and — critically — what won't fix the problem and why. This format makes symptom patches obvious because they can't fill in the "Root Cause" and "Why This Happens" fields convincingly.

Pitfall 3: Shotgun Debugging

Changing Everything at Once

Making multiple changes simultaneously and hoping the problem goes away. Even if it works, you don't know which change fixed it — and you may have introduced new bugs with the other changes.

Signs You're Doing This

How Propel Prevents This

Investigation Skills + 3-Strike Limit

The investigation skill forces structured evidence gathering before any changes. The 3-strike limit stops repeated attempts of the same approach — after three failures, Claude must re-examine its assumptions rather than try "one more variation."

Pitfall 4: Context Window Degradation

Working Too Long Without Clearing

As a conversation grows, Claude's ability to recall and reason about earlier context degrades. Quality drops subtly — Claude starts repeating itself, forgetting constraints, or making mistakes it wouldn't make in a fresh session.

Symptoms

How Propel Prevents This

Context Hygiene Skill + scratch/ Directories

The context-hygiene skill prompts /clear at regular intervals. Investigation findings are written to scratch/ directories with a living README, so context survives across clears. The retrospective skill captures session learnings before clearing. Nothing important lives only in the conversation — it's always persisted to files.

Pitfall 5: Displaced Fixes

Fixing the Wrong File

The bug is in the loss function, but the "fix" is in the data pipeline. The code where the symptom appears is not always the code where the bug lives.

How Propel Prevents This

Data Flow Tracer + Bug Classification

Debugger Mode's bug classification forces identifying where the bug actually is (specific file and line), not just where the symptom appears. The data-flow-tracer agent traces values through the pipeline to find the real source. Gate 4 requires "Root Cause" with specific line numbers — vague locations are rejected.

Pitfall 6: Skipping Investigation

Jumping Straight to Implementation

The most natural instinct: "I know what I want, just build it." This skips the investigation phase where Claude would discover that the existing codebase already has a pattern for this, or that the approach you have in mind conflicts with established conventions.

What Gets Missed

How Propel Prevents This

Mandatory Gate 0 + Gate 1

Gate 0 (intake) and Gate 1 (investigation) cannot be skipped in Engineer Mode. Even if you're confident, the investigation phase catches mismatches between your mental model and the actual codebase state. This takes minutes and prevents hours of rework.

Summary

Every pitfall follows the same pattern: skipping a step that feels unnecessary but prevents expensive mistakes. Propel's pipeline encodes these steps as mandatory gates so they can't be skipped by habit or impatience.

PitfallPropel Prevention
Unconstrained ImplementationQ0/Q1 questioner gates force scoping and references
Symptom PatchingGate 4 diagnosis format requires root cause evidence
Shotgun DebuggingInvestigation skill + 3-strike limit
Context Window DegradationContext hygiene skill + scratch/ persistence
Displaced FixesData flow tracer + bug classification with line numbers
Skipping InvestigationMandatory Gate 0 + Gate 1 in all implementation modes