Let’s say I have a cluster of 2 nodes with 10 CPUs each and 16gb of plasma storage.
Let’s say node 1 has obj1 of size 15gb and 6 cores in use, and node 2 has obj2 of size 1gb< and 5< cores in use.
Let’s say I have a job that needs to operate on obj1 and uses 5 CPU cores. Does Ray wait for the jobs on node 1 to finish before sending the job since the 15gb object is already written there? Does Ray have the ability to run it on node 2?
I’m not sure how this all works. I have a workload that has painfully slow objects to load into plasma and I want to store them across nodes, with jobs that need those objects running on those nodes unless there is an abnormally large queue at which point I’ll add the object it needs to an available node. I’m brand new to Ray, would I do this manually by keeping track of object locations, resource availability and then moving objects around?