1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.
2. Environment:
- Ray version: 2.43.0
- Python version: 3.11.0
- OS: Ubuntu 22.04.4 LTS
- Cloud/Infrastructure: Azure
- Other libs/tools (if relevant): Ray on Databricks Spark
3. Question:
Assumptions (correct me if I am wrong about any of these):
- Each trial will by default be allocated 1 CPU.
- Each trial corresponds to a Ray task.
- Each Ray task is scheduled on a Ray Worker that has sufficient resources for it.
- Each Ray Worker only runs one Ray task at a time.
Given the above, assume I have:
- 8 Ray Workers (4 CPUs each)
- 50 trials to be scheduled with default resource usage and no max_concurrency
With all of that, am I correct that the following will occur?
- 8 trials will be started on the 8 Ray Workers at first
- These running trials would be using 8 CPUs (1 per trial, as is default), with the other 24 CPUs being used for other Ray processes or idling
- When one of the trials finishes, another will be scheduled so that at all times, as long as there are more trials to schedule, there will be 8 concurrent trials?
The reason I am confused is because I ran a trial with this configuration and found that there were 22 trials in the RUNNING state at some point, but I do not understand how 22 of them could be running at the same time.