1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.
2. Environment:
Ray version: 2.38.0
Python version: 3.10.16 cluster, 3.10.12 local
OS: Ubuntu 22.04
Cloud/Infrastructure: Microk8s installed on vps
Other libs/tools (if relevant):
Hello Ray community,
I’m running against an issue, and maybe someone here could help.
I would like to force the environment runners to be deployed to separate ray workers. This is because the simulation stack I am using is a bit cumbersome to scale. All my programs talk to each other using localhost and ports using TCP and UDP communication protocols. This is not something I can change really, and prevents the easy parallelization of environments.
To still scale this I figured I could run a lot of parallel pods with my simulation stack + a ray worker container. This way I would have n amount of parallel ray workers, each with a single simulation running. However, when I specify num_env_runners = 2 in my config, it seems they are both deployed onto a single ray worker (which is running only a single simulation)
Hello Flim!
Can you try setting remote_worker_envs=True in your config to see if that helps? I see that being mentioned here: Environments — Ray 2.43.0 “to create individual subenvironments as separate processes and step them in parallel”. Like just add it to your .env_runners() section in your config .
It’s described as this:
remote_worker_envs – If using num_envs_per_env_runner > 1, whether to create those new envs in remote processes instead of in the same worker. This adds overheads, but can make sense if your envs can take much time to step / reset (e.g., for StarCraft). Use this cautiously; overheads are significant.
Do you think this would be helpful for your use case?
Let me know if it helps!!
Christina
One method is to limit the num_cpus for each ray worker you start. The driver claims 1 cpu and each env_runner claims 1 cpu. With create_env_on_local worker True you will need 2 cpus for one of the ray workers.
The other option is to use custom resources. You can start a ray worker with a custom resource following this documentation.
In the env_runner config you can specify a custom resource with this setting: