I have an actor:
@ray.remote
class SimulationWorkerActor:
def __init__(self):
self.loop = None
async def run_loop(self, loop: Loop, split_id: int):
self.split_id = split_id
self.loop = loop
start = time.time()
print(f'Started loop for split {split_id}')
loop.run() # blocking cpu intensive computation
self.run_loop_time = time.time() - start
print(f'Finished loop for split {split_id} in {self.run_loop_time}s')
And orchestration code:
actors = [SimulationWorkerActor.options(num_cpus=1).remote() for _ in range(len(self.generators))]
print(f'Inited {len(actors)} worker actors')
refs = [actors[i].run_loop.remote(
loop=Loop(...),
split_id=i
) for i in range(len(actors))]
print(f'Scheduled loops, waiting for finish...')
# wait for all runs to finish
ray.get(refs)
What I expect is all of the run_loop
methods to run in parallel, however what I get from logs is that they are executed sequentially by Ray cluster:
Scheduled loops, waiting for finish...
(SimulationWorkerActor pid=11588, ip=10.244.3.10) Started loop for split 0
(SimulationWorkerActor pid=11588, ip=10.244.3.10) Finished loop for split 0 in 2.4789621829986572s
(SimulationWorkerActor pid=11412, ip=10.244.2.10) Started loop for split 1
(SimulationWorkerActor pid=11412, ip=10.244.2.10) Finished loop for split 1 in 2.550433397293091s
(SimulationWorkerActor pid=9168, ip=10.244.0.10) Started loop for split 2
(SimulationWorkerActor pid=9168, ip=10.244.0.10) Finished loop for split 2 in 2.5661652088165283s
(SimulationWorkerActor pid=8806, ip=10.244.4.11) Started loop for split 3
(SimulationWorkerActor pid=8806, ip=10.244.4.11) Finished loop for split 3 in 2.499436140060425s
Why is this happening? How do I make my actors work independently, in parallel?
My setup:
Ray 2.4.0, cluster runs in minikube on M2 mac