1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.
2. Environment:
- Ray version: 2.49
- Python version: 3.12
- OS: AWS Linux
- Cloud/Infrastructure: AWS
- Other libs/tools (if relevant): ECS on EC2
3. What we are trying to do:
We are investigating if it’s possible to run a Ray cluster where the headnode is spun-up as an ECS service, which then auto-scales workers as usual; by creating EC2 instances.
Instead of using ray up, we are trying starting the headnode-taks with:
ray start --head --dashboard-host=0.0.0.0 --port=6379 --dashboard-port=80 --disable-usage-stats --autoscaling-config ray_bootstrap_config.yaml
The ray_bootstrap_config.yaml, we pulled from one of our other clusters and modified it a bit to align with the environment we are trying to do this in.
4. What is happening:
Not much… It all starts without errors, but the cluster isn’t coming up. I can see the following:
$ ray status:
No cluster status. It may take a few seconds for the Ray internal services to start up.
$ ray cluster-dump ray_bootstrap_config.yaml:
2025-10-30 08:21:47,624 WARN commands.py:1569 -- You are about to create a cluster dump. This will collect data from cluster nodes.
The dump will contain this information:
- The logfiles of your Ray session
This usually includes Python outputs (stdout/stderr)
- Debug state information on your Ray cluster
e.g. number of workers, drivers, objects, etc.
- Your installed Python packages (`pip freeze`)
- Information on your running Ray processes
This includes command line arguments
If you are concerned about leaking private information, extract the archive and inspect its contents before sharing it with anyone.
2025-10-30 08:21:47,624 INFO cluster_dump.py:563 -- Retrieving cluster information from ray cluster file: ray_bootstrap_config.yaml
2025-10-30 08:21:47,912 INFO commands.py:389 -- Checking AWS environment settings
2025-10-30 08:21:47,913 VINFO utils.py:149 -- Creating AWS resource `ec2` in `eu-west-1`
And that’s basically the state it stays in until I kill it.
5. Questions:
- How can I get some more information on what the auto-scaler is trying to do? Are there other logs I can inspect, or documentation on something similar I can look into? Any tips?
- Is this even an approach worth pursuing? I can spends more time trying to make this work, but if the hive-mind tells me it’s a stupid idea I will go back to the drawing board

Thx!