Docs › Agents › Env Researcher

Env Researcher

Deeply researches the simulation environments used in a project — not at a surface level, but at the level needed to write correct code against the environment's API and physics.

Overview

Claude's training data contains averaged, potentially outdated information about simulation environments. The Env Researcher replaces that with precise, current documentation. This is critical because environment APIs have subtle behaviors — observation ordering, action scaling, frame conventions, reset semantics — that are not obvious from code and cause silent bugs when assumed incorrectly.

Property	Details
Tools	Read, Grep, Glob, WebSearch, WebFetch, Task
Auto-Dispatch	On demand — at investigation start or when asking about environment behavior
Trigger	Environment integration, wrapper changes, questions about observation/action spaces or physics parameters

Environment Identification

The researcher scans the codebase to identify which environments are in use:

Import Pattern	Environment
`mujoco`, `mujoco.mjx`	MuJoCo / MJX (JAX-accelerated MuJoCo)
`robosuite`	robosuite (manipulation tasks)
`metaworld`	Meta-World (multi-task manipulation)
`isaacgym`, `isaaclab`, `omni.isaac`	Isaac Gym / Isaac Sim / Isaac Lab
`dm_control`	DeepMind Control Suite
`gymnasium`, `gym`	Gymnasium / OpenAI Gym
`brax`	Brax (JAX-based physics)
`pybullet`	PyBullet

Also identifies: environment version (from pinned dependencies), specific tasks/scenes, custom wrappers, and body models (rodent, fly, humanoid, custom MJCF/URDF).

Documentation Deep Dive

For each identified environment, the researcher fetches and reads official documentation thoroughly. Key areas for each:

MuJoCo / MJX

mjModel / mjData structure, actuator types, sensor API, contact parameters, solver options
MJX-specific: what's supported vs not in JAX compilation, stepping semantics
MJCF model format: actuator definitions, tendon routing, equality constraints
Known pitfalls: quaternion conventions (wxyz vs xyzw), frame conventions, contact softness defaults

robosuite

Observation keys and meanings, action dimensions per robot, controller types (OSC, joint velocity)
Known pitfalls: observation normalization assumptions, action space clipping, gripper action conventions

Meta-World

Task distribution, goal representation, observation/action space per task, success metrics
Known pitfalls: goal-conditioned vs fixed-goal, observation space changes between versions

Isaac Gym / Isaac Lab

Tensor API vs scene API, GPU pipeline, observation/action buffers, domain randomization API
Known pitfalls: GPU vs CPU pipeline behavior differences, reset indexing, parallel env semantics

dm_control

Physics timestep vs control timestep, observation spec, action spec, task rewards
Known pitfalls: physics.data vs physics.named.data, time_limit behavior

Brax

brax.envs API, pipeline backends (Spring, Positional, MJX), state representation
Known pitfalls: backend differences in contact handling, auto-reset semantics

Universal Research Checklist

For any environment, the researcher always researches:

Observation space — what each element means, ordering, normalization, coordinate frames
Action space — dimensions, meaning, scaling, clipping behavior
Reset semantics — what state is randomized, how, initial distribution
Stepping — what step() actually does (sub-steps, integration method, contact resolution)
Reward — how it's computed, dense/sparse, any shaping
Termination — what triggers done, truncation vs termination distinction
Physics parameters — timestep, gravity, friction defaults, solver iterations
API gotchas — version-specific behavior changes, deprecated features

Codebase Cross-Reference

After reading docs, the researcher verifies the codebase against them:

Observation usage matches docs — if the env returns obs[0:3] as position and the code treats it as velocity, flags it
Action scaling matches docs — if the env expects [-1, 1] but policy outputs unbounded, flags it
Reset handling matches docs — auto-reset correctness, truncation vs termination distinction
Wrapper chain is correct — traces full wrapper stack and verifies ordering
Physics parameters match intent — solver iterations, timestep, contact parameters appropriate for the task

Implementation Gotchas

The researcher compiles environment-specific pitfalls. Examples:

Common RL Environment Gotchas

Auto-reset envs return the NEW episode's first observation on the terminal step, not the final observation. Use info["final_observation"] to get the actual last obs. Vectorized envs may have different reset semantics than single envs. Frame stacking wrappers change observation shape and semantics.

Environment	Common Gotcha
MuJoCo	Quaternion convention is `[w, x, y, z]` — many frameworks use `[x, y, z, w]`
MuJoCo	`mj_step` does `mj_step1` + `mj_step2` — calling them separately changes when forces apply
MJX	`mjx.step()` does not support all MuJoCo features — check actuator/sensor compatibility
robosuite	OSC controller clips internally — policy output range doesn't map linearly to end-effector movement
robosuite	`env.step()` may call `physics.step()` multiple times per control step
Isaac	GPU vs CPU pipeline behavior differences in contact handling

Output Format

The researcher produces a structured report containing:

Environment Stack — base environment, version, tasks, body model, wrappers, framework integration
API Reference Summary — observation space table, action space table, step semantics, reset behavior, reward structure
Codebase Cross-Reference — checklist of verified/flagged items
Implementation Gotchas — specific pitfalls with evidence from docs
Documentation Sources — links to all referenced docs pages

Key Principles

Read the actual docs, don't guess — training data may be outdated. Observation semantics are everything — the #1 source of silent RL bugs is misinterpreting observation elements. Action scaling is #2. Version matters — Gymnasium vs gym, MuJoCo 3.x vs 2.x, APIs change between versions.