Env Researcher
Deeply researches the simulation environments used in a project — not at a surface level, but at the level needed to write correct code against the environment's API and physics.
Overview
Claude's training data contains averaged, potentially outdated information about simulation environments. The Env Researcher replaces that with precise, current documentation. This is critical because environment APIs have subtle behaviors — observation ordering, action scaling, frame conventions, reset semantics — that are not obvious from code and cause silent bugs when assumed incorrectly.
| Property | Details |
|---|---|
| Tools | Read, Grep, Glob, WebSearch, WebFetch, Task |
| Auto-Dispatch | On demand — at investigation start or when asking about environment behavior |
| Trigger | Environment integration, wrapper changes, questions about observation/action spaces or physics parameters |
Environment Identification
The researcher scans the codebase to identify which environments are in use:
| Import Pattern | Environment |
|---|---|
mujoco, mujoco.mjx | MuJoCo / MJX (JAX-accelerated MuJoCo) |
robosuite | robosuite (manipulation tasks) |
metaworld | Meta-World (multi-task manipulation) |
isaacgym, isaaclab, omni.isaac | Isaac Gym / Isaac Sim / Isaac Lab |
dm_control | DeepMind Control Suite |
gymnasium, gym | Gymnasium / OpenAI Gym |
brax | Brax (JAX-based physics) |
pybullet | PyBullet |
Also identifies: environment version (from pinned dependencies), specific tasks/scenes, custom wrappers, and body models (rodent, fly, humanoid, custom MJCF/URDF).
Documentation Deep Dive
For each identified environment, the researcher fetches and reads official documentation thoroughly. Key areas for each:
MuJoCo / MJX
mjModel/mjDatastructure, actuator types, sensor API, contact parameters, solver options- MJX-specific: what's supported vs not in JAX compilation, stepping semantics
- MJCF model format: actuator definitions, tendon routing, equality constraints
- Known pitfalls: quaternion conventions (
wxyzvsxyzw), frame conventions, contact softness defaults
robosuite
- Observation keys and meanings, action dimensions per robot, controller types (OSC, joint velocity)
- Known pitfalls: observation normalization assumptions, action space clipping, gripper action conventions
Meta-World
- Task distribution, goal representation, observation/action space per task, success metrics
- Known pitfalls: goal-conditioned vs fixed-goal, observation space changes between versions
Isaac Gym / Isaac Lab
- Tensor API vs scene API, GPU pipeline, observation/action buffers, domain randomization API
- Known pitfalls: GPU vs CPU pipeline behavior differences, reset indexing, parallel env semantics
dm_control
- Physics timestep vs control timestep, observation spec, action spec, task rewards
- Known pitfalls:
physics.datavsphysics.named.data,time_limitbehavior
Brax
brax.envsAPI, pipeline backends (Spring, Positional, MJX), state representation- Known pitfalls: backend differences in contact handling, auto-reset semantics
Universal Research Checklist
For any environment, the researcher always researches:
- Observation space — what each element means, ordering, normalization, coordinate frames
- Action space — dimensions, meaning, scaling, clipping behavior
- Reset semantics — what state is randomized, how, initial distribution
- Stepping — what
step()actually does (sub-steps, integration method, contact resolution) - Reward — how it's computed, dense/sparse, any shaping
- Termination — what triggers done, truncation vs termination distinction
- Physics parameters — timestep, gravity, friction defaults, solver iterations
- API gotchas — version-specific behavior changes, deprecated features
Codebase Cross-Reference
After reading docs, the researcher verifies the codebase against them:
- Observation usage matches docs — if the env returns
obs[0:3]as position and the code treats it as velocity, flags it - Action scaling matches docs — if the env expects
[-1, 1]but policy outputs unbounded, flags it - Reset handling matches docs — auto-reset correctness, truncation vs termination distinction
- Wrapper chain is correct — traces full wrapper stack and verifies ordering
- Physics parameters match intent — solver iterations, timestep, contact parameters appropriate for the task
Implementation Gotchas
The researcher compiles environment-specific pitfalls. Examples:
Auto-reset envs return the NEW episode's first observation on the terminal step, not the final observation. Use info["final_observation"] to get the actual last obs. Vectorized envs may have different reset semantics than single envs. Frame stacking wrappers change observation shape and semantics.
| Environment | Common Gotcha |
|---|---|
| MuJoCo | Quaternion convention is [w, x, y, z] — many frameworks use [x, y, z, w] |
| MuJoCo | mj_step does mj_step1 + mj_step2 — calling them separately changes when forces apply |
| MJX | mjx.step() does not support all MuJoCo features — check actuator/sensor compatibility |
| robosuite | OSC controller clips internally — policy output range doesn't map linearly to end-effector movement |
| robosuite | env.step() may call physics.step() multiple times per control step |
| Isaac | GPU vs CPU pipeline behavior differences in contact handling |
Output Format
The researcher produces a structured report containing:
- Environment Stack — base environment, version, tasks, body model, wrappers, framework integration
- API Reference Summary — observation space table, action space table, step semantics, reset behavior, reward structure
- Codebase Cross-Reference — checklist of verified/flagged items
- Implementation Gotchas — specific pitfalls with evidence from docs
- Documentation Sources — links to all referenced docs pages
Read the actual docs, don't guess — training data may be outdated. Observation semantics are everything — the #1 source of silent RL bugs is misinterpreting observation elements. Action scaling is #2. Version matters — Gymnasium vs gym, MuJoCo 3.x vs 2.x, APIs change between versions.