Running the head node as an ECS service

RunLLM · October 30, 2025, 4:00pm

Running a Ray head node as an ECS service and expecting it to autoscale EC2 worker nodes is not a standard or documented Ray deployment pattern. The Ray autoscaler expects to manage both head and worker nodes, typically via ray up or by running the head node on an EC2 instance with the correct IAM permissions and network access. When running the head node in ECS, the autoscaler may not have the necessary permissions or network visibility to launch and manage EC2 worker nodes, which can result in the cluster not coming up as you’ve observed. According to the Ray community, this approach is not commonly supported or recommended, and similar issues have been reported when deviating from the documented cluster launcher workflows (discuss.ray.io, github.com/ray-project/ray/issues/39565).

For debugging, you can inspect logs on the head node (e.g., /tmp/ray/session_latest/logs/monitor*, cluster_monitor.log, and cluster_monitor.err) to see what the autoscaler is attempting. The ray cluster-dump command can also collect logs and debug state, but if the autoscaler is not able to launch or connect to workers, these logs may not be very informative. The recommended approach is to use the standard Ray cluster launcher (ray up) on an EC2 instance, not ECS, to ensure the autoscaler can manage the full lifecycle of both head and worker nodes (discuss.ray.io). Would you like more detail on how to set up Ray clusters on AWS using the supported methods?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Only head node started, not worker nodes Ray Clusters	1	1549	January 19, 2022
Starting up ray cluster on AWS EC2 instance Ray Clusters	4	1407	April 2, 2024
Problem creating a cluster on AWS EC2, with a custom IAM role on the workers ta giving me an error autoscaling failed to start nodes of type ray.worker.default. (Unauthorized operation) Ray Clusters	0	543	April 18, 2023
Ray up on AWS - unable to initialize workers Ray Clusters	4	61	November 4, 2025
Sample ray autoscaling config Ray Clusters	5	1191	June 4, 2021

Running the head node as an ECS service

Related topics