Hardware requirements and setup for running performant APE-X

richardbellman · March 9, 2023, 1:21am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hello! I am a new rllib user and I would like to train APE-X agents on ALE. I have a few questions that I hope you can answer:

Currently, I have access to a machine with 64 CPUs (252GB of RAM) and 1 GPU (NVIDIA 3090Ti, 24GB RAM). How should I pick num_rollout_workers and num_envs_per_worker per my hardware specs?
If I want to get to about 40k frames per second with APE-X, what kind of compute infrastructure would I need? How many GPUs and how many CPUs? What about RAM?
Conceptual question: my understanding is that APE-X will create a learner on the GPU (that computes gradients and updates network parameters) and several actors on several CPUs. How do these actors determine the actions they need to send to the env? Do each of the actors have a copy of the Q-network? Or do they ask the Q-network on the GPU for actions? Are the results in the arxiv paper for RLlib obtained by running the actors on CPU or GPU?

Thank you in advance!

richardbellman · March 23, 2023, 8:13pm

Hello RLLib team, sorry to bother you again, but I am stuck on this and would really appreciate some insight. Figure 5b says that with 64 workers it is possible to get a speed of 40k fps; the description says that 1 V100 GPU was used. But I cannot figure out how many CPUs and how much CPU RAM I would need to run a similar experiment with 64 workers.

Topic		Replies	Views
How many workers? Best way to determine number of workers? RLlib	3	2007	January 3, 2023
Training and inference ONLY using GPUs and no CPUs RLlib	7	1868	April 12, 2021
Most efficient way to use only a CPU for training RLlib	3	3109	April 22, 2021
[rllib] gpu sampling memory and performance issues RLlib	0	243	December 18, 2020
Ape-X not working after 1.3 update RLlib	3	385	April 27, 2021

Hardware requirements and setup for running performant APE-X

Related topics