How do you gracefully handle "actor died"?

Please consider the following toy application:

import ray

class DowningToolsError(Exception):
    pass

@ray.remote
class Supervisor:

    def __init__(self):

        print('Supervisor working')

        self._worker1 = Worker1.remote()
        
        try:
            self._worker2 = Worker2.remote()

        except DowningToolsError:
            print(f"Worker did not like that idea! Let's try another plan!")

@ray.remote
class Worker1:

    def __init__(self):

        print('Worker1 working')

@ray.remote
class Worker2:

    def __init__(self):

        raise DowningToolsError('You must be kidding me!')

if __name__ == '__main__':

    Supervisor = Supervisor.remote()

    while True:
        pass

What do I have to do instead of:

  except DowningToolsError:
      print(f"Worker did not like that idea! Let's try another plan!")

to make Supervisor gracefully handle Worker2’s failure?

I have tried the following:

except ray.exceptions.RayActorError:
    print(f"Worker2 is not available. Let's try another plan!")

but that did not have the desired effect.

Any suggestions would be much appreciated!

I

1 Like

Hi @Hapi ,

Worker2.remote() shouldn’t throw exceptions. Exceptions are thrown when you actually call actor method like

try:
   ray.get(self._worker2.some_method.remote())
except: ray.exceptions.RayActorError:
   ....

In my real application, worker2 connects to an external websockets server, and it does this in its init method. If the websockets server is unavailable, the connection errors, and I need to catch it somewhere, so I can close everything down gracefully. I’m curious how other people handle similar situations in ray?