Forcing environment runners onto separate ray workers

flimdejong · March 7, 2025, 11:23am

1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.

2. Environment:

Ray version: 2.38.0
Python version: 3.10.16 cluster, 3.10.12 local
OS: Ubuntu 22.04
Cloud/Infrastructure: Microk8s installed on vps
Other libs/tools (if relevant):

Hello Ray community,

I’m running against an issue, and maybe someone here could help.
I would like to force the environment runners to be deployed to separate ray workers. This is because the simulation stack I am using is a bit cumbersome to scale. All my programs talk to each other using localhost and ports using TCP and UDP communication protocols. This is not something I can change really, and prevents the easy parallelization of environments.

To still scale this I figured I could run a lot of parallel pods with my simulation stack + a ray worker container. This way I would have n amount of parallel ray workers, each with a single simulation running. However, when I specify num_env_runners = 2 in my config, it seems they are both deployed onto a single ray worker (which is running only a single simulation)

This is my config file:

config = (
    PPOConfig()
    .environment("RoboTeamEnv")
    .framework("torch")
    .resources(num_gpus=0)
    .env_runners(
        num_env_runners=2,
        num_envs_per_env_runner=1,
        sample_timeout_s=120,
        create_env_on_local_worker=True,
    )
    .api_stack(
        enable_rl_module_and_learner=True,
        enable_env_runner_and_connector_v2=True
    )
    .training(
        train_batch_size_per_learner=512,
    )
)

I have looked at placement groups, but I had a hard time integrating it with Rllib. Is this the only way?

Thanks,

Flim

christina · March 8, 2025, 2:47am

Hello Flim!
Can you try setting remote_worker_envs=True in your config to see if that helps? I see that being mentioned here: Environments — Ray 2.43.0 “to create individual subenvironments as separate processes and step them in parallel”. Like just add it to your .env_runners() section in your config .

It’s described as this:

remote_worker_envs – If using num_envs_per_env_runner > 1, whether to create those new envs in remote processes instead of in the same worker. This adds overheads, but can make sense if your envs can take much time to step / reset (e.g., for StarCraft). Use this cautiously; overheads are significant.

Do you think this would be helpful for your use case?
Let me know if it helps!!
Christina

mannyv · March 10, 2025, 3:13pm

Hi @flimdejong,

One method is to limit the num_cpus for each ray worker you start. The driver claims 1 cpu and each env_runner claims 1 cpu. With create_env_on_local worker True you will need 2 cpus for one of the ray workers.

The other option is to use custom resources. You can start a ray worker with a custom resource following this documentation.

In the env_runner config you can specify a custom resource with this setting:

github.com/ray-project/ray

rllib/algorithms/algorithm_config.py

74f53dfc7


      
          gym_env_vectorize_mode: The gymnasium vectorization mode for vector envs.
              Must be a `gymnasium.envs.registration.VectorizeMode` (enum) value.
              Default is SYNC. Set this to ASYNC to parallelize the individual sub
              environments within the vector. This can speed up your EnvRunners
              significantly when using heavier environments.
          num_cpus_per_env_runner: Number of CPUs to allocate per EnvRunner.
          num_gpus_per_env_runner: Number of GPUs to allocate per EnvRunner. This can
              be fractional. This is usually needed only if your env itself requires a
              GPU (i.e., it is a GPU-intensive video game), or model inference is
              unusually expensive.
          custom_resources_per_env_runner: Any custom Ray resources to allocate per
              EnvRunner.
          sample_timeout_s: The timeout in seconds for calling `sample()` on remote
              EnvRunner workers. Results (episode list) from workers that take longer
              than this time are discarded. Only used by algorithms that sample
              synchronously in turn with their update step (e.g., PPO or DQN). Not
              relevant for any algos that sample asynchronously, such as APPO or
              IMPALA.
          max_requests_in_flight_per_env_runner: Max number of in-flight requests
              to each EnvRunner (actor)). See the
              `ray.rllib.utils.actor_manager.FaultTolerantActorManager` class for more

Lars_Simon_Zehnder · March 11, 2025, 7:58pm

Hey @flimdejong , thanks for raising this question. This is I guess interesting for more than just one user.

If you run within Ray Tune, you might use

DQNConfig().resources(placement_strategy="SPREAD")

Topic		Replies	Views
Num_env_runners VS num_envs_per_env_runner with remote_worker_envs=True RLlib	3	125	November 2, 2024
Environments with VectorEnv not able to run in parallel RLlib	10	864	June 7, 2022
Multiple environments, expensive resets and "remote_worker_envs": True RLlib	4	634	June 30, 2023
Issue with multiple environments training one PPO policy RLlib	0	22	May 25, 2025
Expanding RLlib learning environment with multiple simulators and machines while reducing communication overhead Configure Algorithm, Training, Evaluation, Scaling	1	429	June 23, 2023

Forcing environment runners onto separate ray workers

Related topics