Scheduling with data affinity


Is there a way to schedule a task according to data affinity? if not is this something that is planned?

Say a task has dependency on 4 objects (equally sized for the sake of the example) 3 are located on node 1 and another one on node 2.
According to the whitepaper if the task originated from node 2 and the node has enough resources it will be scheduled on the node 2 and will transfer the 3 objects from node 1 to the node’s plasma.

In this case it might be beneficial to schedule the task on node 1 so it will transfer only 1 object between the nodes.


1 Like

Hi Guy!

There is actually a feature called locality-aware leasing that does exactly this! In the case that you mention, if more task dependency bytes are on node 1 than node 2, and node 1 has resources available to execute the task, the task will be scheduled onto node 1, even if the task was submitted on node 2. In other words, our scheduling policy changed to locality-first scheduling instead of local-first scheduling. This scheduling policy was released in version 1.2, with further optimizations to be released in the upcoming 1.3 release.

1 Like

@Clark_Zinzow Great! thanks for the info!