Use Ray to parallelize tasks

Hello,

Is it possible with Ray to launch a set of k tasks using Ray in such a way it automatically sets a number of CPUs and a given number of GPUs available per task?

Without any tuning of the models I mean.

Thank you,
Luca

Hi Luca, can you tell us a bit more about your use case? What do you want to achieve specifically?

Hello @kai, my use case was the following: I had just finished tuning some meta-model and I needed to run on the same multiple holdouts some base-line models, so for like M models, each one with T tasks, I would have needed to run H holdouts.

When running the optimization with tune, Ray makes it possible and especially easy to define the number of cores, the percentage of GPU usage per task and generally handle the scheduling of the tasks, according to the available resources.

I would have found very helpful to be able to define a list of tasks, each one with its callback, its parameters and resources requirement, feeding this to something like results = ray.distribute(tasks).

From the Ray tutorials I have seen, I believe that this is kind-off possible, but I could not figure it out and I ended up just writing a single-use wrapper for the methods that does what I described above, that only works with a single machine and not a cluster. With Ray ability to scale on different systems, from notebooks to HPCs, something like this to distribute tasks would come very handy. It would sort-of be a relatively painless SLURM system (nothing running on HPCs will ever be painless :stuck_out_tongue: ).

I’m not sure if I understand correctly, but if you just want to start a bunch of remote tasks with specific resource requirements, you can do something like this:

import ray

def remote_task(args):
    pass  # Do something

task_1 = ray.remote(num_cpus=2, num_gpus=1)(remote_task).remote(arg)
task_2 = ray.remote(num_cpus=4, num_gpus=2)(remote_task).remote(arg)
task_3 = ray.remote(num_cpus=1, num_gpus=0)(remote_task).remote(arg)

ray.get([task_1, task_2, task_3])

If a task in infeasible to schedule it will be queued and run as soon as it is possible to do so. Though yeah, the way Ray Tune keeps track of used resources is quite nice.

If you want to you can share your single-purpose script and we can see if there’s any quick wins you could do to parallelize this better!