What is the correct way of handling an exception from a a list of tasks executed with ray.get?
ray.init() @ray.remote(max_retries=5) def f(i): try: save(i) except Exception: raise Exception x = ray.get[f.remote(i) for i in range(20)]
whereby it is possible that one of the tasks in f could raise an exception, but you would like the other tasks to complete.
A use case for this would be saving data to disk. If a save from one of the tasks fails for a dataset in one of the task workers, then a solution would be to retry the function.
What would be the best way to handle this problem. Also I am not referring to a workercrasherror, but I guess it can be thought of as inevitably crashing when an Exception is raised.
I have tried:
try: x = ray.get([f.remote(i) for i in range(20)]) except RayError as e: print(e.pid)
However, I only every get back the pid of 1 of the tasks that fails and not multiple pid(s) if more than one task raises and Exception.