Regression Guard

Ensures new code additions or modifications don't silently break, alter, or degrade existing pipelines. A silent regression can invalidate weeks of experiments.

Overview

The Regression Guard audits new code changes to ensure backward compatibility. It traces every touchpoint between new and existing code to find unintended side effects — before they reach a training run.

PropertyDetails
ToolsRead, Grep, Glob, Bash
Auto-DispatchYes — before merging any feature branch
TriggerAny feature branch with code changes; training loop or optimizer modifications

Change Scope Analysis

Before anything else, the guard understands exactly what changed:

Interface Compatibility

Checks that existing code calling into modified code still works:

Pipeline Regression Tracing

Traces the full pipeline end-to-end to find where new code touches existing code:

Behavioral Equivalence

For code that was refactored but should behave identically, the guard checks for subtle behavioral changes:

Dependency and Side Effect Analysis

Displaced Fix Detection

Critical Check

One of the most dangerous patterns in research code: fixing a problem in module A by changing module B. This creates hidden coupling, makes the codebase fragile, and often introduces new bugs.

The guard looks for these displaced fix patterns:

PatternExample
Fix location doesn't match bug locationBug is in the loss function but the "fix" changes data preprocessing. Bug is in the decoder but the "fix" normalizes the encoder output.
Compensating hacksAdding * 0.5 upstream to counteract a doubled value downstream. Adding a transpose to "undo" a wrong axis convention from another module.
Workarounds in shared codeFixing a problem specific to one model variant by changing shared infrastructure, forcing all other variants to live with the workaround.
Config-level fixes for code bugsAdding loss_scale=0.5 because the loss is accidentally doubled somewhere. The correct fix is to fix the doubling.
Shape manipulation papering over mismatchesAdding reshapes, squeezes, or transposes at module boundaries when the real issue is one module produces the wrong shape.

For each change, the guard asks:

  1. Is the change in the same module/function where the problem originates?
  2. If not, why? Is there a legitimate architectural reason, or is this working around a root cause?
  3. Would this fix survive if the "other" module changed?
  4. Does this introduce an implicit contract between two modules that isn't enforced by tests?
Fix the Bug Where the Bug Is

If a fix changes module B to compensate for a problem in module A, reject it. When module A is later fixed properly, module B's workaround becomes a new bug. Always fix the root cause directly.

Conditional Path Verification

When new code adds branches (if/else, new model types, new loss terms):

Output Format

The guard produces a structured report containing:

Core Principle

The default behavior must not change. If someone runs the exact same config and command as before the change, they must get the same results. New behavior should only activate when explicitly requested. "It still runs" is not "it still works" — a pipeline can run without errors but produce silently different results.