I’m having difficulty figuring out a good way to use only a subset of the CPUs in an established cluster in ray 1.0.1. I don’t want to tear down and restart my cluster every time a parallel operation is requested by the user with fewer CPUs than the total number.
for example, an easy solution would be this if it existed:
ray.init(num_cpus=100)
results = ray.get(jobs, processes=25) # use 25 out of 100 CPUs
But since ray.get() doesn’t take a processes argument I tried this:
ray.init(num_cpus=100)
pool = ray.multiprocessing.Pool(processes=25, ray_address=‘auto’)
results = pool.starmap(f, args_list)
del pool
But the pool implementation is barely faster than single processor computing in my setup, and also gives a lot of crash messages related to and unrelated to memory shortages (ray.get() doesn’t have the performance and crashing issues for me).
I also chunked the list of jobs and called ray.get() on each chunk, but this used more CPUs than the chunk size, so it didn’t work to limit CPU utilization.
Is there a different way to do this that doesn’t require the pool interface?
Thanks
John