Num_env_runners VS num_envs_per_env_runner with remote_worker_envs=True

Pitcherrr · October 30, 2024, 4:06am

How severe does this issue affect your experience of using Ray?

None: Just asking a question out of curiosity

I am looking for some clarity on the difference between increasing num_env_runners vs num_envs_per_env_runner with remote_worker_envs=True. num_env_runners=N is said to create N copies of RLModules and policies and step these envs in parallel so what is the difference if increasing num_envs_per_env_runner and setting remote_worker_envs=True does the same (parallel reset/step).

I am ultimately trying to determine the best configuration is to scale an environment that can only step at about 2 Hz.

sven1977 · November 1, 2024, 1:26pm

Hey @Pitcherrr , good question!

The remote_worker_envs setting stems from the old API stack, where we had the option to run each individual env (within a vector env) as its own ray actor (on a separate process than the RolloutWorker). However, using this is only recommended for very very slow envs, where one env.step() takes a considerable amount of time.

On the new API stack:

you can use the same setting (we’ll probably rename it, soon to clarify its meaning)
it only works for SingleAgent (we are working on a fix for multi-agent thanks to the new gymnasium==1.0.0 upgrade)
setting it to True does NOT mean we actually create ray actors for each sub-env, but rather, we use gymnasium’s built-in vectorization feature. This means, each sub-env gets its own process, but it uses multiprocessing (rather than ray) under the hood, which is faster.

Morphlng · November 2, 2024, 8:54am

Hi, thanks for the reply! But I’m a little confused by the code documentation.

In old stack’s rollout config, the document said that:

"""
num_envs_per_worker: Number of environments to evaluate vector-wise per
                worker. This enables model inference batching, which can improve
                performance for inference bottlenecked workloads.
"""

In new stack, it also said that:

"""
num_envs_per_env_runner: Number of environments to step through
    (vector-wise) per EnvRunner. This enables batching when computing
    actions through RLModule inference, which can improve performance
    for inference-bottlenecked workloads.
"""

Both said that this feature is only for “evaluate/inference”. But from my experience, it will indeed boost up the training progress (more significant than increasing worker/env_runners). I’m wondering if in RLlib’s context, “evaluate/inference” refers to “sample”?

mannyv · November 2, 2024, 2:54pm

Hi @Morphlng,

Yes during the sampling phase the network policy is only doing inference on observations from the environment.

Topic		Replies	Views
Multiple environments, expensive resets and "remote_worker_envs": True RLlib	4	632	June 30, 2023
Environments with VectorEnv not able to run in parallel RLlib	10	864	June 7, 2022
Forcing environment runners onto separate ray workers RLlib	3	46	March 11, 2025
What is the difference between num_env_runners and num_rollout_workers? RLlib	3	216	August 11, 2024
Environments running in parallel or sequential? RLlib	0	234	April 1, 2022

Num_env_runners VS num_envs_per_env_runner with remote_worker_envs=True

Related topics