However, they also do not introduce any task scheduling dependencies either. I have done a simple experiment to verify this. This also holds for ray dags/workflows that use ray.bind instead.
My main question is, is there any danger of starvation? I.e., the tasks with the implicit dependencies being scheduled before their dependencies and then being stuck waiting for them? Even if starvation is not a possibility, how about scheduling inefficiencies, i.e., too many tasks with implicit dependencies occupying cores just to wait for their dependencies to become available? If any of the aforementioned is an issue, would it make sense to allow invocations of ray.remote(*args, **kwargs) to somehow accept a set of extra scheduling dependencies?
There isn’t starvation danger, since Ray eagerly executes tasks (even if no one has called ray.get()) on the result object reference yet. Indeed this can cause inefficiencies, so you should be careful when passing objects by reference.
Ray does some smart things to avoid deadlock in these cases, by releasing resources when ray.get() is called. For example, if a task calls ray.get() on an object ref, the CPU slot of the current task is released for other tasks to use.
Regarding the final issue, there isn’t any way to wait on a dep unless you take it as an argument. You could always pass it as a *arg to force resolution.
Does this also happen on ray.wait()? Will the CPU resource be released in that case too? Are these two facts (ray.get and ray.wait releasing the cpu resource) documented anywhere?
It was a little counter-intuitive to hear that nested ObjectRefs don’t create a scheduling dependency. I was aware that they dont get de-referenced but it was counter-intuitive that they do not create a scheduling dependency at all! I thought nested object ref vs top-level object refs were like ray.wait vs ray.get;
If ray.wait gives up CPU like ray.get then we can atleast replicate the behaviour we want (i.e. coordinate scheduling via nested-object refs without fetching the actual data. Thats because we have a file system side-effect that requires a certain order of execution)