How to let actor inherit resource from its owner?

Hi everyone, I have a questions about resource management in ray.

In my application, I need to run a deep learning training actor on each GPU, and periodically run some tests on the learned model. It should be the normal pipeline in deep learning. In my case, some of the tests need to use GPU, but only limited resources. I want use ray to run them in parallel to speed up the test. However, in order to run the test as a ray task, I need to schedule new resources for it.

My current practice is, say I have two tests that need GPUs, I will schedule the training actor with 0.8 GPUs and each tests with 0.1 GPUs. It works but definitely not elegant, because it is not flexible for the number of tests and may schedule tasks on device that is different with the device of the training actor.

One thought is, since the training actor is the owner of the test tasks, can they just inherit the resource of its owner? I think in this way, the program will be more flexible and more understandable as it perform like multiprocessing on a local machine.

You can use placement groups for this. Create a placement group with 3 bundles, then assign the training actor to one bundle, and the tasks to the other bundles.

https://docs.ray.io/en/master/placement-group.html#placement-groups

2 Likes