Control over Plasma Storage/Scheduling

alexis-drakopoulos · April 7, 2021, 8:03am

I deal with objects of varying sizes (between a few MB and many GB). When the objects are small I don’t mind them being moved between nodes, when they are large I want them to remain on a single node unless there is a severe lack of resources.

What would be useful is a way to “control” which nodes objects are sent to, and which nodes jobs should be prioritised on.

Eg if I have

Node 1 - 10 CPUs, 5gb plasma
Node 2 - 5 CPUs, 3gb plasma

and I have let’s say 3 objects which each need 3 different numbers of operations on:
Obj1 - 10MB needs about 20 jobs which take 5 - 10 seconds each
Obj2 - 50MB needs about 20 jobs which take 30 seconds each
Obj3 - 3GB needs about 100 jobs which take 1min each

Now ideally what I could do is something like this:

ray.put(Obj1, node1)
ray.put(Obj2, node1)
ray.put(Obj3, node1)
ray.put(Obj3, node2) # saturates node 2

for job in obj1_jobs:
    job.remote(node1)
for job in obj2_jobs:
   job.remote(node1)
for job in obj3_jobs:
   job.remote([node1, node2])

and let the scheduler figure out the rest.

I am new to distributed computing so I’m unsure if what I’m saying just hints at a bad architecture.

sangcho · April 7, 2021, 8:48pm

In Ray, the recommended solution is to use the placement group. This ensures tasks or actors that are started with the same placement group will be “placed” on a node that you specified. For your case, you can use the STRICT_PACK strategy. Placement Groups — Ray v1.2.0

What you can do is to submit tasks/actors that uses the same object to the same placement group with STRICT_PACK or PACK policy. This will ensure that objects will be used by tasks/actors that are scheduled in a packed nodes.

alexis-drakopoulos · April 8, 2021, 6:31am

Cheers this looks like what I’m looking for. I’ll give it a go!

Topic		Replies	Views
How does the scheduler "decide" to send jobs based on resources/plasma storage? Ray Core	1	510	April 5, 2021
Sending control to individual nodes Ray Core	3	323	August 10, 2021
How to create @ray.remote jobs that will only run on the workers from the local node? Ray Core	6	2217	May 9, 2021
How to run a function exactly once on each node? Ray Core	4	2166	May 18, 2021
How to: ensure actor is running on the same node only? Ray Core	13	1855	May 13, 2021

Control over Plasma Storage/Scheduling

Related topics