Env Researcher

Deeply researches the simulation environments used in a project — not at a surface level, but at the level needed to write correct code against the environment's API and physics.

Overview

Claude's training data contains averaged, potentially outdated information about simulation environments. The Env Researcher replaces that with precise, current documentation. This is critical because environment APIs have subtle behaviors — observation ordering, action scaling, frame conventions, reset semantics — that are not obvious from code and cause silent bugs when assumed incorrectly.

PropertyDetails
ToolsRead, Grep, Glob, WebSearch, WebFetch, Task
Auto-DispatchOn demand — at investigation start or when asking about environment behavior
TriggerEnvironment integration, wrapper changes, questions about observation/action spaces or physics parameters

Environment Identification

The researcher scans the codebase to identify which environments are in use:

Import PatternEnvironment
mujoco, mujoco.mjxMuJoCo / MJX (JAX-accelerated MuJoCo)
robosuiterobosuite (manipulation tasks)
metaworldMeta-World (multi-task manipulation)
isaacgym, isaaclab, omni.isaacIsaac Gym / Isaac Sim / Isaac Lab
dm_controlDeepMind Control Suite
gymnasium, gymGymnasium / OpenAI Gym
braxBrax (JAX-based physics)
pybulletPyBullet

Also identifies: environment version (from pinned dependencies), specific tasks/scenes, custom wrappers, and body models (rodent, fly, humanoid, custom MJCF/URDF).

Documentation Deep Dive

For each identified environment, the researcher fetches and reads official documentation thoroughly. Key areas for each:

MuJoCo / MJX

robosuite

Meta-World

Isaac Gym / Isaac Lab

dm_control

Brax

Universal Research Checklist

For any environment, the researcher always researches:

  1. Observation space — what each element means, ordering, normalization, coordinate frames
  2. Action space — dimensions, meaning, scaling, clipping behavior
  3. Reset semantics — what state is randomized, how, initial distribution
  4. Stepping — what step() actually does (sub-steps, integration method, contact resolution)
  5. Reward — how it's computed, dense/sparse, any shaping
  6. Termination — what triggers done, truncation vs termination distinction
  7. Physics parameters — timestep, gravity, friction defaults, solver iterations
  8. API gotchas — version-specific behavior changes, deprecated features

Codebase Cross-Reference

After reading docs, the researcher verifies the codebase against them:

Implementation Gotchas

The researcher compiles environment-specific pitfalls. Examples:

Common RL Environment Gotchas

Auto-reset envs return the NEW episode's first observation on the terminal step, not the final observation. Use info["final_observation"] to get the actual last obs. Vectorized envs may have different reset semantics than single envs. Frame stacking wrappers change observation shape and semantics.

EnvironmentCommon Gotcha
MuJoCoQuaternion convention is [w, x, y, z] — many frameworks use [x, y, z, w]
MuJoComj_step does mj_step1 + mj_step2 — calling them separately changes when forces apply
MJXmjx.step() does not support all MuJoCo features — check actuator/sensor compatibility
robosuiteOSC controller clips internally — policy output range doesn't map linearly to end-effector movement
robosuiteenv.step() may call physics.step() multiple times per control step
IsaacGPU vs CPU pipeline behavior differences in contact handling

Output Format

The researcher produces a structured report containing:

Key Principles

Read the actual docs, don't guess — training data may be outdated. Observation semantics are everything — the #1 source of silent RL bugs is misinterpreting observation elements. Action scaling is #2. Version mattersGymnasium vs gym, MuJoCo 3.x vs 2.x, APIs change between versions.