HI,
I know ray spins up by default one worker process per available core, but is it possible to change that? I my case I would like to reduce this number
Yes! You can set num_cpus
as an option in ray.init
or --num-cpus
in ray start
depending on how you use Ray. You can provide any positive integer there and it will limit the number of worker processes.
but this will also limit the number of CPUs that I have available when I create an actor right? I want to run just 1 worker process, but I need my actor processes to use all the available CPUs
Yes, ray.init(num_cpus=n)
will limit the overall number cores that ray uses.
If you want to give an actor control over a CPU core that is managed by ray, you can do the following:
@ray.remote(num_cpus=n)
class CPUActor(object):
pass
Similar to the examples in the documentations of ray actors, this will leave your actor with n CPU cores.
If you leave out these options and declare your actor like this:
@ray.remote()
class SomeActor(object):
pass
… ray will try to make things work for this actor, but you will not have the guarantee that your actor has n CPU cores under it’s control at any given time.
OK so there is no way to limit the number of upfront processes created by ray , the idle ones used by the remote functions, without limiting the overall CPU access from what I understand. It would be a nice to have though.
The num_cpus specifies a limit for ray on the number of concurrent tasks that it can schedule. They cannot sum to a value greater than num_cpus. It does not however place any restrictions on the actual amount of system recources the running task utilizes. You could schedule a task that you tell ray requires 1 “cpu” but actually use 400. At that point it becomes up to you to make sure you are not oversubscribing the system. In some ways num_cpus is a misnomer it is really more like max_tasks.
OK so there is no way to limit the number of upfront processes created by ray , the idle ones used by the remote functions, without limiting the overall CPU access from what I understand. It would be a nice to have though.
If you reserve m cores for ray and n cores “inside” ray for individual actors, your raylet scheduler process will have m - n cores available for “other” remote function calls. The actual “number of upfront processes created by ray” can, to my knowledge, only be limited by their resource requirements.
But this makes sense, right? Why would you want to limit the number of processes upfront and end up with errors if you spawn more processes? Better spawn processes to your needs and hope they get enough processing time on a core for things to work out for you. If they do not get scheduled in time, you cluster is simply not powerfull enough for your program to be handled by ray and limiting the “number of upfront processes created by ray” will not change that but only impose another restriction on your program.
I hope I am not on the wrong track here, does this help?
OK maybe the whole thing is confusing the way I said it. I have 8 cores on a VM where I run a ray application. I have an actor which I start and is supposed to be always up. When I start the application and I go to the ray dashboard I see the process created for my actor, plus 8 more processes which are created by ray by default, one process per core. These 8 processes are always idle and if a remote function is run then one of these idle processes runs the function. Now, my issue is this, I do not want 8 processes idling, I want to decide how many processes get created upfront, maybe I want just one idling process, maybe two, I don’t want 8 processes. You might ask why, and this is because whenever I start these processes they allhold something like 72MB of RAM each. This is nothing, although not sure why 72*8=576MB of RAM should be there, considering that the VM is not running just this ray application but also other applications under k8s. On top of that, whenever I run a remote function, the memory of each one of these 8 processes goes up to something around 360MB each, now 360*8=2880MB which is almost 3GB. Now you might understand why I would not want to have 8 idling processes taking up this memory right?
So at the end, what I need to know is if it is possible to limit the number of these idling processes which are created by default when ray starts.
Well, limiting the upfront processes does not mean I cannot create new processes for each actor. If, say, I have only 1 process created upfront, I can still create 300 actors each one running on a different process…if I really want. My problem is not limiting the CPUs that I want to use, but just the number of idling processes created upfront. They do nothing, are too many, and they take up too much memory
Just out of curiosity: have you checked that those processes actually take up memory that adds up?
As far as I know, *NIX-like systems show the memory requirements per process in a way where shared memory is included, e.g. memory needed for libraries and python modules. But those shared memory chunks actually exist once, so adding up the memory shown for each process will not actually give the total memory used, but much less.
So instead of looking at those memory sizes and how they add up, it may be more interesting to see how the total free memory is going down, which may be less of a problem (all assuming that a large part of the memory shown for each process are indeed shared memory areas).
Good point, I thought this too. Need to check again, but essentially the original issue is that we have a memory leak on some old code which is huge and complex, which we are forced to keep for now before we are able to rewrite the whole up. We have this hack in place, were we run the leaking code in a subprocess, and when we terminate the subprocess teh memory gets freed up. Now that we use ray we just wanted to run the process as a remote function. Now a function uses a process in the pool, so it does not get destroyed, but I ran it inside an actor and killed it at each run. This works, but when we tried with the remote function I was checking with htop on the VM and it seemed all to add up, however the leakage was filling up to 6GB of RAM, so I could not discriminate if these 8*360MB of RAM were actually all held. Will try though, thanks for pointing this out