Nested objects and task scheduling dependencies

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity

Hello,

I have tried out the following with Ray 1.12.1. Nested objects arguments are possible, but they do not get automatically dereferenced:
https://docs.ray.io/en/releases-1.12.1/ray-core/objects.html#passing-objects-by-reference

However, they also do not introduce any task scheduling dependencies either. I have done a simple experiment to verify this. This also holds for ray dags/workflows that use ray.bind instead.

My main question is, is there any danger of starvation? I.e., the tasks with the implicit dependencies being scheduled before their dependencies and then being stuck waiting for them? Even if starvation is not a possibility, how about scheduling inefficiencies, i.e., too many tasks with implicit dependencies occupying cores just to wait for their dependencies to become available? If any of the aforementioned is an issue, would it make sense to allow invocations of ray.remote(*args, **kwargs) to somehow accept a set of extra scheduling dependencies?

Thank you.

There isn’t starvation danger, since Ray eagerly executes tasks (even if no one has called ray.get()) on the result object reference yet. Indeed this can cause inefficiencies, so you should be careful when passing objects by reference.

Ray does some smart things to avoid deadlock in these cases, by releasing resources when ray.get() is called. For example, if a task calls ray.get() on an object ref, the CPU slot of the current task is released for other tasks to use.

Regarding the final issue, there isn’t any way to wait on a dep unless you take it as an argument. You could always pass it as a *arg to force resolution.