How to find if an ObjectRef failed without an expensive ray.get() call?

Hi all,

I have a question about checking ObejctRefs for failure before calling ray.get(). The use case is that I may have many outstanding ObjectRefs, and if any fail I want to exit my program immediately. However, these ObjectRefs are intermediate values, and I’d rather not call ray.get() on the head node, because the contents are pretty hefty. My understanding is that using ray.get() would incur the IO cost of moving those values to the head node. Ideally I would be able to determine if any failed, and if not, invoke more ray remote functions to consume the intermediate ObjectRefs.

It looks like maybe in the past you could use ray.error_info?

Maybe I could call ObjectRef. as_future and leverage the exception check on the future (Futures — Python 3.9.5 documentation) ? Or is that only present after an expensive ray.get() call?

Hi,

I have a similar problem. I am trying to invoke a sequence of ray.remote calls and only issue the next set if all the previous calls succeeded. I use finished = ray.wait(..., fetch_local=False) to wait for completion but then I’d like to check (in an inexpensive way) if all the elements of finished ran without an exception.
How can I go about doing that with ray?
Thanks,

Tom