Num_env_runners VS num_envs_per_env_runner with remote_worker_envs=True

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity

I am looking for some clarity on the difference between increasing num_env_runners vs num_envs_per_env_runner with remote_worker_envs=True. num_env_runners=N is said to create N copies of RLModules and policies and step these envs in parallel so what is the difference if increasing num_envs_per_env_runner and setting remote_worker_envs=True does the same (parallel reset/step).

I am ultimately trying to determine the best configuration is to scale an environment that can only step at about 2 Hz.

Hey @Pitcherrr , good question!

The remote_worker_envs setting stems from the old API stack, where we had the option to run each individual env (within a vector env) as its own ray actor (on a separate process than the RolloutWorker). However, using this is only recommended for very very slow envs, where one env.step() takes a considerable amount of time.

On the new API stack:

  • you can use the same setting (we’ll probably rename it, soon to clarify its meaning)
  • it only works for SingleAgent (we are working on a fix for multi-agent thanks to the new gymnasium==1.0.0 upgrade)
  • setting it to True does NOT mean we actually create ray actors for each sub-env, but rather, we use gymnasium’s built-in vectorization feature. This means, each sub-env gets its own process, but it uses multiprocessing (rather than ray) under the hood, which is faster.

Hi, thanks for the reply! But I’m a little confused by the code documentation.

In old stack’s rollout config, the document said that:

"""
num_envs_per_worker: Number of environments to evaluate vector-wise per
                worker. This enables model inference batching, which can improve
                performance for inference bottlenecked workloads.
"""

In new stack, it also said that:

"""
num_envs_per_env_runner: Number of environments to step through
    (vector-wise) per EnvRunner. This enables batching when computing
    actions through RLModule inference, which can improve performance
    for inference-bottlenecked workloads.
"""

Both said that this feature is only for “evaluate/inference”. But from my experience, it will indeed boost up the training progress (more significant than increasing worker/env_runners). I’m wondering if in RLlib’s context, “evaluate/inference” refers to “sample”?

Hi @Morphlng,

Yes during the sampling phase the network policy is only doing inference on observations from the environment.