I deal with objects of varying sizes (between a few MB and many GB). When the objects are small I don’t mind them being moved between nodes, when they are large I want them to remain on a single node unless there is a severe lack of resources.
What would be useful is a way to “control” which nodes objects are sent to, and which nodes jobs should be prioritised on.
Eg if I have
Node 1 - 10 CPUs, 5gb plasma
Node 2 - 5 CPUs, 3gb plasma
and I have let’s say 3 objects which each need 3 different numbers of operations on:
Obj1 - 10MB needs about 20 jobs which take 5 - 10 seconds each
Obj2 - 50MB needs about 20 jobs which take 30 seconds each
Obj3 - 3GB needs about 100 jobs which take 1min each
Now ideally what I could do is something like this:
ray.put(Obj1, node1)
ray.put(Obj2, node1)
ray.put(Obj3, node1)
ray.put(Obj3, node2) # saturates node 2
for job in obj1_jobs:
job.remote(node1)
for job in obj2_jobs:
job.remote(node1)
for job in obj3_jobs:
job.remote([node1, node2])
and let the scheduler figure out the rest.
I am new to distributed computing so I’m unsure if what I’m saying just hints at a bad architecture.