How severe does this issue affect your experience of using Ray?
None: Just asking a question out of curiosity
I am looking for some clarity on the difference between increasing num_env_runners vs num_envs_per_env_runner with remote_worker_envs=True. num_env_runners=N is said to create N copies of RLModules and policies and step these envs in parallel so what is the difference if increasing num_envs_per_env_runner and setting remote_worker_envs=True does the same (parallel reset/step).
I am ultimately trying to determine the best configuration is to scale an environment that can only step at about 2 Hz.
The remote_worker_envs setting stems from the old API stack, where we had the option to run each individual env (within a vector env) as its own ray actor (on a separate process than the RolloutWorker). However, using this is only recommended for very very slow envs, where one env.step() takes a considerable amount of time.
On the new API stack:
you can use the same setting (we’ll probably rename it, soon to clarify its meaning)
it only works for SingleAgent (we are working on a fix for multi-agent thanks to the new gymnasium==1.0.0 upgrade)
setting it to True does NOT mean we actually create ray actors for each sub-env, but rather, we use gymnasium’s built-in vectorization feature. This means, each sub-env gets its own process, but it uses multiprocessing (rather than ray) under the hood, which is faster.
Hi, thanks for the reply! But I’m a little confused by the code documentation.
In old stack’s rollout config, the document said that:
"""
num_envs_per_worker: Number of environments to evaluate vector-wise per
worker. This enables model inference batching, which can improve
performance for inference bottlenecked workloads.
"""
In new stack, it also said that:
"""
num_envs_per_env_runner: Number of environments to step through
(vector-wise) per EnvRunner. This enables batching when computing
actions through RLModule inference, which can improve performance
for inference-bottlenecked workloads.
"""
Both said that this feature is only for “evaluate/inference”. But from my experience, it will indeed boost up the training progress (more significant than increasing worker/env_runners). I’m wondering if in RLlib’s context, “evaluate/inference” refers to “sample”?