EnvRunners vs VectorEnvs at Scaled Networking Distribution

Xavier_Geerinck · September 13, 2024, 3:36pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

We are currently scaling simulators with Ray for gathering experience across our network infrastructure. What is the recommended path for RLLib to implement this?

Currently we integrated the comm stack in the VectorEnvs through a customized EnvRunner and VectorEnv implementation, but we feel that this causes quite some overhead, requiring a separate thread and event loop for each of the sims. The need for event loops, then creates a downpropagating issue with thread context ownerships and gRPC being sensitive to it.

Would it make sense to just not use a VectorEnv and instead use EnvRunners in a 1:1 fashion (1 env runner <> 1 sim) as scalable actors and communicators? Or are there any other best practices we should look at?

Note: the networking overhead is negligible here as the sims tend to be VERY Slow and resource intensive (hence why we think EnvRunners might suffice).

Topic		Replies	Views
ExternalEnv vs. External Application Clients? RLlib	3	550	July 12, 2021
Handling complex computations in Env RLlib	4	413	January 2, 2022
Num_env_runners VS num_envs_per_env_runner with remote_worker_envs=True RLlib	3	107	November 2, 2024
Trouble implementing concurrent code alongside rllib environment RLlib	0	92	March 15, 2024
Only use Ray to vectorize environment RLlib	4	401	July 15, 2021

EnvRunners vs VectorEnvs at Scaled Networking Distribution

Related topics