Docs › Common Pitfalls

Common Pitfalls

Known failure modes when working with Claude — and how Propel's pipeline prevents each one.

Pitfall 1: Unconstrained Implementation

The Fundamental Problem

Giving Claude a vague, open-ended implementation request leads to plausible-looking but fundamentally wrong code. This is the single most common failure mode and the primary reason Propel's gate system exists.

What Goes Wrong

When Claude receives an unconstrained problem like "I want to build a can transport task with robosuite", three things happen:

Gaps are filled with training-data averages — Claude picks the most likely architecture, reward structure, and API usage based on what it has seen, not what your project needs.
Confident but arbitrary choices are made silently — Claude won't tell you it's guessing about the reward function, the observation space, or the controller interface.
The code compiles but is subtly wrong — the implementation looks reasonable and passes cursory review, but embeds wrong assumptions that surface much later (e.g., during training).

Why It Happens

Claude is fundamentally a pattern-matching system. When given a specific example to morph ("take PickPlace and add a transport phase"), it produces excellent results because the constraints are tight. When given an open-ended request ("build a can transport task"), it has to fill in every design decision from its training distribution — and the mean of all possible implementations is rarely the correct one for your specific use case.

Key Insight

Claude is great at: looking at one thing and morphing it into something you want.
Claude is bad at: creating something new from scratch when the problem is unconstrained.

Symptoms

Claude produces a complete implementation without asking clarifying questions about architecture or reference implementations
The code uses a plausible but wrong API (e.g., an older version of a library, or a different framework's conventions)
Reward functions look reasonable but don't match any known working design
The implementation doesn't follow patterns established in the rest of the codebase
After training, the agent doesn't learn — and debugging reveals fundamental design issues, not just hyperparameter problems

How Propel Prevents This

The Questioner Gates

The Questioner checkpoints (Q0 and Q1) in the Propel pipeline exist specifically to prevent unconstrained implementation. The cost of 5 minutes of scoping questions is negligible. The cost of an unconstrained implementation that compiles but trains wrong is hours of debugging and wasted compute.

Q0 (before investigation) forces the user to provide:

A reference codebase or repo to start from
An architecture or design pattern to follow
An existing example to study and adapt
A benchmark or ground truth to verify against
Specific APIs or framework conventions to use

Q1 (before design) forces the user to specify:

Interface contracts and data formats
Configuration approach and defaults
Edge case handling
Integration points with existing code
Minimal vs. extended scope

If the user cannot provide any reference implementation, Q0 flags this as a high-risk unconstrained implementation so that investigation is extra thorough.

Example: Bad vs. Good

Bad (unconstrained):
User:  Build a can transport task with robosuite.
Claude: [produces 300 lines of code with arbitrary choices
        for reward, obs space, controller, etc.]

Good (constrained with Q0):
User:  Build a can transport task with robosuite.
Claude: [Q0] Is there an existing task I should use as a
        starting point?
User:  Yes, look at PickPlace in robosuite/environments/manipulation/
Claude: [Q0] What architecture should I follow — same reward
        phases as PickPlace?
User:  Same structure, but add a transport phase between
        pick and place.
Claude: [Q0] What benchmark should I verify against?
User:  PickPlace test suite as baseline.
[Claude now has concrete anchors and produces correct code]

Pitfall 2: Symptom Patching

Masking the Real Problem

Applying a surface-level fix that suppresses a symptom without addressing the root cause. The problem appears "fixed" but the underlying bug remains, often manifesting in a different and harder-to-debug way later.

What It Looks Like

Symptom	Patch (Wrong)	Actual Fix
Loss goes to NaN	Add `torch.clamp()` to prevent NaN	Trace the NaN to its source — the clamp masks the real problem
Test fails intermittently	Add `@retry(3)` decorator	Find the race condition or state leak causing flakiness
Shape mismatch error	Add a `.reshape()` at the error site	Fix the upstream operation that produces the wrong shape

How Propel Prevents This

Debugger Mode's Diagnosis-First Rule

Debugger Mode enforces diagnosis before fixing. Gate 4 requires presenting the root cause, the mechanism, proposed fix, side effects, and — critically — what won't fix the problem and why. This format makes symptom patches obvious because they can't fill in the "Root Cause" and "Why This Happens" fields convincingly.

Pitfall 3: Shotgun Debugging

Changing Everything at Once

Making multiple changes simultaneously and hoping the problem goes away. Even if it works, you don't know which change fixed it — and you may have introduced new bugs with the other changes.

Signs You're Doing This

A commit touches 5+ files for a "bug fix" with no clear hypothesis
"I changed the learning rate, the loss function, and the data augmentation and now it works"
The fix works but nobody can explain why

How Propel Prevents This

Investigation Skills + 3-Strike Limit

The investigation skill forces structured evidence gathering before any changes. The 3-strike limit stops repeated attempts of the same approach — after three failures, Claude must re-examine its assumptions rather than try "one more variation."

Pitfall 4: Context Window Degradation

Working Too Long Without Clearing

As a conversation grows, Claude's ability to recall and reason about earlier context degrades. Quality drops subtly — Claude starts repeating itself, forgetting constraints, or making mistakes it wouldn't make in a fresh session.

Symptoms

Claude re-introduces a bug it already fixed earlier in the session
Claude forgets constraints you specified 20+ messages ago
Responses become more generic and less specific to your codebase
Claude stops referencing the investigation findings it generated earlier

How Propel Prevents This

Context Hygiene Skill + scratch/ Directories

The context-hygiene skill prompts /clear at regular intervals. Investigation findings are written to scratch/ directories with a living README, so context survives across clears. The retrospective skill captures session learnings before clearing. Nothing important lives only in the conversation — it's always persisted to files.

Pitfall 5: Displaced Fixes

Fixing the Wrong File

The bug is in the loss function, but the "fix" is in the data pipeline. The code where the symptom appears is not always the code where the bug lives.

How Propel Prevents This

Data Flow Tracer + Bug Classification

Debugger Mode's bug classification forces identifying where the bug actually is (specific file and line), not just where the symptom appears. The data-flow-tracer agent traces values through the pipeline to find the real source. Gate 4 requires "Root Cause" with specific line numbers — vague locations are rejected.

Pitfall 6: Skipping Investigation

Jumping Straight to Implementation

The most natural instinct: "I know what I want, just build it." This skips the investigation phase where Claude would discover that the existing codebase already has a pattern for this, or that the approach you have in mind conflicts with established conventions.

What Gets Missed

Existing utilities that already solve the problem (now you have duplicated code)
Naming conventions that the new code should follow but doesn't
Edge cases that the existing codebase handles but the new implementation ignores
Integration points that will break when the new code is connected

How Propel Prevents This

Mandatory Gate 0 + Gate 1

Gate 0 (intake) and Gate 1 (investigation) cannot be skipped in Engineer Mode. Even if you're confident, the investigation phase catches mismatches between your mental model and the actual codebase state. This takes minutes and prevents hours of rework.

Summary

Every pitfall follows the same pattern: skipping a step that feels unnecessary but prevents expensive mistakes. Propel's pipeline encodes these steps as mandatory gates so they can't be skipped by habit or impatience.

Pitfall	Propel Prevention
Unconstrained Implementation	Q0/Q1 questioner gates force scoping and references
Symptom Patching	Gate 4 diagnosis format requires root cause evidence
Shotgun Debugging	Investigation skill + 3-strike limit
Context Window Degradation	Context hygiene skill + scratch/ persistence
Displaced Fixes	Data flow tracer + bug classification with line numbers
Skipping Investigation	Mandatory Gate 0 + Gate 1 in all implementation modes