Ray Actor creation lasts 5 minutes

Hey guys, first of all, thanks for creating such awesome library. I love it.

I’m running a Ray Cluster of 2 nodes connected to each other via Docker Swarm. Everything works smoothly on head node, just the worker node delays actor creation for about 5 minutes. It gets scheduled immediately but hangs on “PENDING_CREATION” state. I checked raylet.io logs and found this line:
[2022-02-04 09:55:55,954 I 3848 3848] worker_pool.cc:418: Started worker process of 1 worker(s) with pid 8387
it’s at the time I create the Actor but the actual creation and going Alive lasts about 5 minutes.
Is it normal behavior? If not, how can I debug what’s happening?

From gcs_server.out:
[2022-02-04 09:55:59,687 I 894 894] gcs_actor_scheduler.cc:328: Start creating actor f764b411021c014e03cc84f39d000000 on worker 3bd407c88accc72895c18566b04aeb6d92ff24e88f380e8280a9540b at node cab011462be950a4805179616b4d2c85ea4a9fa4695b81c5f7f4f9bb, job id = 9d000000
[2022-02-04 10:01:25,472 I 894 894] gcs_actor_scheduler.cc:367: Succeeded in creating actor f764b411021c014e03cc84f39d000000 on worker 3bd407c88accc72895c18566b04aeb6d92ff24e88f380e8280a9540b at node cab011462be950a4805179616b4d2c85ea4a9fa4695b81c5f7f4f9bb, job id = 9d000000

Cheers

Edit: recreated the Cluster and problem solved.

1 Like

This post was hidden so I created new topic in Ray Cluster’s category. Apparently this problem arises when Ray Cluster is for few hours. Here is 2nd thread: