Common Pitfalls
Known failure modes when working with Claude — and how Propel's pipeline prevents each one.
Pitfall 1: Unconstrained Implementation
Giving Claude a vague, open-ended implementation request leads to plausible-looking but fundamentally wrong code. This is the single most common failure mode and the primary reason Propel's gate system exists.
What Goes Wrong
When Claude receives an unconstrained problem like "I want to build a can transport task with robosuite", three things happen:
- Gaps are filled with training-data averages — Claude picks the most likely architecture, reward structure, and API usage based on what it has seen, not what your project needs.
- Confident but arbitrary choices are made silently — Claude won't tell you it's guessing about the reward function, the observation space, or the controller interface.
- The code compiles but is subtly wrong — the implementation looks reasonable and passes cursory review, but embeds wrong assumptions that surface much later (e.g., during training).
Why It Happens
Claude is fundamentally a pattern-matching system. When given a specific example to morph ("take PickPlace and add a transport phase"), it produces excellent results because the constraints are tight. When given an open-ended request ("build a can transport task"), it has to fill in every design decision from its training distribution — and the mean of all possible implementations is rarely the correct one for your specific use case.
Claude is great at: looking at one thing and morphing it into something you want.
Claude is bad at: creating something new from scratch when the problem is unconstrained.
Symptoms
- Claude produces a complete implementation without asking clarifying questions about architecture or reference implementations
- The code uses a plausible but wrong API (e.g., an older version of a library, or a different framework's conventions)
- Reward functions look reasonable but don't match any known working design
- The implementation doesn't follow patterns established in the rest of the codebase
- After training, the agent doesn't learn — and debugging reveals fundamental design issues, not just hyperparameter problems
How Propel Prevents This
The Questioner checkpoints (Q0 and Q1) in the Propel pipeline exist specifically to prevent unconstrained implementation. The cost of 5 minutes of scoping questions is negligible. The cost of an unconstrained implementation that compiles but trains wrong is hours of debugging and wasted compute.
Q0 (before investigation) forces the user to provide:
- A reference codebase or repo to start from
- An architecture or design pattern to follow
- An existing example to study and adapt
- A benchmark or ground truth to verify against
- Specific APIs or framework conventions to use
Q1 (before design) forces the user to specify:
- Interface contracts and data formats
- Configuration approach and defaults
- Edge case handling
- Integration points with existing code
- Minimal vs. extended scope
If the user cannot provide any reference implementation, Q0 flags this as a high-risk unconstrained implementation so that investigation is extra thorough.
Example: Bad vs. Good
Bad (unconstrained):
User: Build a can transport task with robosuite.
Claude: [produces 300 lines of code with arbitrary choices
for reward, obs space, controller, etc.]
Good (constrained with Q0):
User: Build a can transport task with robosuite.
Claude: [Q0] Is there an existing task I should use as a
starting point?
User: Yes, look at PickPlace in robosuite/environments/manipulation/
Claude: [Q0] What architecture should I follow — same reward
phases as PickPlace?
User: Same structure, but add a transport phase between
pick and place.
Claude: [Q0] What benchmark should I verify against?
User: PickPlace test suite as baseline.
[Claude now has concrete anchors and produces correct code]
Pitfall 2: Symptom Patching
Applying a surface-level fix that suppresses a symptom without addressing the root cause. The problem appears "fixed" but the underlying bug remains, often manifesting in a different and harder-to-debug way later.
What It Looks Like
| Symptom | Patch (Wrong) | Actual Fix |
|---|---|---|
| Loss goes to NaN | Add torch.clamp() to prevent NaN | Trace the NaN to its source — the clamp masks the real problem |
| Test fails intermittently | Add @retry(3) decorator | Find the race condition or state leak causing flakiness |
| Shape mismatch error | Add a .reshape() at the error site | Fix the upstream operation that produces the wrong shape |
How Propel Prevents This
Debugger Mode enforces diagnosis before fixing. Gate 4 requires presenting the root cause, the mechanism, proposed fix, side effects, and — critically — what won't fix the problem and why. This format makes symptom patches obvious because they can't fill in the "Root Cause" and "Why This Happens" fields convincingly.
Pitfall 3: Shotgun Debugging
Making multiple changes simultaneously and hoping the problem goes away. Even if it works, you don't know which change fixed it — and you may have introduced new bugs with the other changes.
Signs You're Doing This
- A commit touches 5+ files for a "bug fix" with no clear hypothesis
- "I changed the learning rate, the loss function, and the data augmentation and now it works"
- The fix works but nobody can explain why
How Propel Prevents This
The investigation skill forces structured evidence gathering before any changes. The 3-strike limit stops repeated attempts of the same approach — after three failures, Claude must re-examine its assumptions rather than try "one more variation."
Pitfall 4: Context Window Degradation
As a conversation grows, Claude's ability to recall and reason about earlier context degrades. Quality drops subtly — Claude starts repeating itself, forgetting constraints, or making mistakes it wouldn't make in a fresh session.
Symptoms
- Claude re-introduces a bug it already fixed earlier in the session
- Claude forgets constraints you specified 20+ messages ago
- Responses become more generic and less specific to your codebase
- Claude stops referencing the investigation findings it generated earlier
How Propel Prevents This
The context-hygiene skill prompts /clear at regular intervals. Investigation findings are written to scratch/ directories with a living README, so context survives across clears. The retrospective skill captures session learnings before clearing. Nothing important lives only in the conversation — it's always persisted to files.
Pitfall 5: Displaced Fixes
The bug is in the loss function, but the "fix" is in the data pipeline. The code where the symptom appears is not always the code where the bug lives.
How Propel Prevents This
Debugger Mode's bug classification forces identifying where the bug actually is (specific file and line), not just where the symptom appears. The data-flow-tracer agent traces values through the pipeline to find the real source. Gate 4 requires "Root Cause" with specific line numbers — vague locations are rejected.
Pitfall 6: Skipping Investigation
The most natural instinct: "I know what I want, just build it." This skips the investigation phase where Claude would discover that the existing codebase already has a pattern for this, or that the approach you have in mind conflicts with established conventions.
What Gets Missed
- Existing utilities that already solve the problem (now you have duplicated code)
- Naming conventions that the new code should follow but doesn't
- Edge cases that the existing codebase handles but the new implementation ignores
- Integration points that will break when the new code is connected
How Propel Prevents This
Gate 0 (intake) and Gate 1 (investigation) cannot be skipped in Engineer Mode. Even if you're confident, the investigation phase catches mismatches between your mental model and the actual codebase state. This takes minutes and prevents hours of rework.
Summary
Every pitfall follows the same pattern: skipping a step that feels unnecessary but prevents expensive mistakes. Propel's pipeline encodes these steps as mandatory gates so they can't be skipped by habit or impatience.
| Pitfall | Propel Prevention |
|---|---|
| Unconstrained Implementation | Q0/Q1 questioner gates force scoping and references |
| Symptom Patching | Gate 4 diagnosis format requires root cause evidence |
| Shotgun Debugging | Investigation skill + 3-strike limit |
| Context Window Degradation | Context hygiene skill + scratch/ persistence |
| Displaced Fixes | Data flow tracer + bug classification with line numbers |
| Skipping Investigation | Mandatory Gate 0 + Gate 1 in all implementation modes |