Question about resource management in Ray

Hi all! I’m using Ray for Reinforcemnet Learning via RLLib, but I want to know how does Ray manages resources. I have seen that even when saying Ray to only use one CPU (via ray.init(num_cpus=1)), it tends to use all the available CPU in the system (that right now is 40 CPUs). So what I want is to execute Ray (and consequently RLlib) using only one of the CPUs of my system (in order to leave the other 39 free to place other tasks on them). I tried to use the function os.sched_setaffinity(0,{0}) in the script where I call ray.init(num_cpus=1) and start my training agent. This visually produces the effect that I wanted to see: only one busy CPU while executing the training, but I have still the doubt of knowing if Ray is scheduling tasks to be developed along 40 CPUs and externally these tasks are forced to be executed all in the same CPU or if Ray knows that and only creates tasks as if it was being executed in a single-CPU machine. I’d like to know also which is the param or config that Ray uses to see how much available resources has available, since I have checked that it is not possible for num_cpusinitialization param to be used for that, because even when setting this value to be 1 all CPUs are used during ray executions.

So I thank you so much for your answers in advance, what I want to do is only to train an RLLibb PPO agent by using the lowest number possible of CPUs in my sistem (I also have two GPUs that I should use) in order to let the free for other tasks.

PD.: I’m using Ray 1.1.0 and training a PPO agent RLlib agent with TF as underlying framework

so you want to know how many resources Ray is actually using during execution? and what command to look for to see that? I’m not 100% sure I understand the question so I wanted to confirm.

1 Like

Hi @bill-anyscale and thank you so much for your answer. What I want is to see how many resources Ray is managing in each moment and also to see how it is managing them (how it creates tasks according to these resources or what tasks it is scheduling onto these resources). In addition I would also like to know how can I force Ray to use only a part of the available resources in my system. I have seen that for GPU limitation usage tou can set different values to the environment variable CUDA_VISIBLE_DEVICES and you can select whether to use or not any of my available GPUs. So what I was looking for was to see if it was an efficient way to do the same thing but with CPUs. My final objetive is to evaluate Ray RLlib PPO training process performance (which consists of a driver and a serie of rollout workers) when executing it minimizing the number of CPUs used (in order to see if it is efficient to place all the Ray jobs in only one CPU and let the rest of the CPUs free for other process to be executed on them). So I want to know if there is any way to tell ray to use only a single (or a reduced group) of CPUs in a multi-CPU system. I have tried to set num_cpus=1 when calling ray.init() but this didn’t seem to produce the desired effect, so what I tried was to force this by using the os.sched_setaffinity()function. This, as I said, produced the desired visual effect (when monitorizing CPU usage with htop only one CPU was shown to have tasks being executed in it) but I really don’t know if this is the efficient way to achieve my goal, because maybe Ray is scheduling tasks and threads to be excuted over 40 CPUs and later these tasks are forced to be executed only on one CPU. So my question is double: first, how can I see scheduled Ray tasks and available resources to Ray and that are taked into account when scheduling these tasks, and secondly, how I can efficiently reduce CPU utilization when using Ray for my purpose (RLLib PPO Training).

Thanks in advance!

cc @sven1977 What’s the recommended way to solve this problem for rllib? Does he need to do something on the Ray layer?

1 Like

@sangcho , I think this is at the Ray layer. Shouldn’t a ray.init limit fix this on a local machine? Also can’t you change the tune trials to limit the number running in parallel? cc @amogkam

1 Like

To answer your first question, the Ray dashboard is your best bet to see the available resources to Ray and what tasks/actors are being scheduled. In order to limit the number of CPUs that Ray uses, setting num_cpus=1 in your ray.init should do the trick (@sangcho can confirm). When you say Ray is using 40 CPUs even though you limited it to 1, how are you measuring this?

1 Like

Hi @amogkam and thanks for your answer. What I mean is that even when I initialize ray with num_cpus=4 (for example) i see tasks being executed in the 40 CPUs of my system and cpu_util_percent (metric reported by Ray Tune) isn’t smaller than when I don’t specify these parameter in ray initialization. So, what I wanted to know was how ray limitates resources (for example if it places tasks only on a specific set of CPUs) and how it decides how many tasks create.

And abiut ray dashboard, I’m running Ray in a remote system connected via ssh, so is there any way to see this info there?

Thank you again!

Can you send what your console output looks like?

1 Like

Of course! But what do you want me to show? I mean what output (which information) do you want to see? When I before mentioned that I saw tasks being executed in the 40 CPUs I was referring to monitorizing CPU usage via htop.

Thanks again!

@javigm98 are you using the core ray or library?

1 Like

@javigm98 actually, Ray doesn’t ensure the cpu affinity. That says, although you set 4 cpus, it doesn’t mean it will actually use 4 cpus of your machine. Also note that Ray has many other components that use CPUs, such as raylet (which can use multiple threads that can use all cpus of your machine).

num_cpus are more for bookeeping (the actual resource isolation is the job for the container layer).

1 Like

Hi @sangcho, I was using RLLib but I think I discovered the problem. I was using Ray 1.1.0 and when I updated the version to 1.2.0 I was able to see executions in only 3 CPUs when initializing with ray(num_cpus=3). So, only to confirm, has this been a feature changed form Ray 1.1.0 to 1.2.0?

Hmm I am confused what you mean by “executions in CPUs” now. You mean the number of processes are corresponding to the number of cpus? Or actual usage of CPUs?

1 Like

Well in fact I was interested in both things. My final objective was to reduce in an effcient way CPU usage and distribute the workload among the lowest numbre of CPUs posiible. But what I say I have seen when updating to 1.2.0 is that only 3 cpus are in use

I think I’d like to understand what you mean by this. What do you mean by only 3 cpus are in use here? How did you verify this?

1 Like

Hi @sangcho and sorry because my explanations were so confusing. I’m initializing Ray with ray.init(num_cpus=3) and I’m using it a to train a RLlib model PPO model that consists of one driver (which is expected to use a CPU) and two workers (which are expected to use one CPU each), and they all share a GPU. So, when I execute that what I expected to see when monitirizing CPU usage with htop, for, example, was that only 3 CPUs have processes running on them. But the result that I obtain is that all CPUs are in use, as you can see in the image:

So I’d like to know if this is the normal Ray behaviour, and in that case why does it make sense to restric the numbre of resources when initializing Ray. In addition, I’d like to know how Ray plans tasks to be executed (threads and processes), I mean, which volume of resources it considers that are available to plan. So for example, will it behave in the same way if for the same execution I would have initialized Ray without specifiying resource constraints?

And even more, is there any way to say Ray to execute only in a part of the resources?, I mean, for example I want Ray to only use CPUs from 0 to 2 and let the other ones completly free for other tasks and programs. Is it appropiate to do so using the os.sched_setaffinity() function?

I hope the explanations are clearer now, and once again sorry for the confussion. Thank you so much in advance!

Looking at your screen shot, I am seeing at least 6 instances of your training script running. Is it possible that you have past training runs that did not shut down all running at the same time?

I am looking at the “python training_scripts/train_model_ppo…” processes.

1 Like

Hi @mannyv. I don’t think so… Every time I initialize Ray I ensure to run a ray.shutdown() before… And the script is only being executed once…

They are definitely running in that screen shot you posted.

It is normal it uses more than 3 cpus because num_cpus=3 doesn’t mean it will only use 3 cpus actually. They are more of a “scheduling hint” rarther than the real resource isolation. So it can technically use all cpus on your machine (little by little).