Simulating preemption while developing?

The current ActorPool does not support actor preemption well: when a machine is preempted, the actor is killed and the entire job dies.

I would like to develop an ActorPool that is more resilient to preemption (identify actors dying, returning their failed jobs back to the queue, and instantiating an alternative actor). I think I have the general idea of how to approach that, but what I don’t have is a good way to simulate preemption short of the very time-consuming and cumbersome process of stopping the gcp machine. What would be a realistic way to simulate preemption for development time?

When we are simulating multi node clusters, we use ray/ at master · ray-project/ray · GitHub. (Note this is not a public API). You can basically simulate preempted nodes in this way;

c = Cluster()
# The first node is always a head node
# 4 worker nodes
worker_nodes = []
for _ in range(4):
# Wait until all nodes are ready.

while True:
    preempted_node = worker_nodes.pop(0)

And you can create another script (driver) to connect this fake cluster using ray.init(address=‘auto’)

1 Like