1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.
2. Environment:
- Ray version: 2.43.0
- Python version: 3.11.0
- OS: Ubuntu 22.04.4 LTS
- Cloud/Infrastructure: Azure
- Other libs/tools (if relevant): Ray on Databricks Spark
3. What happened vs. what you expected:
Context:
- num_samples=50
- Spark cluster set up with 4 workers of 8 cores each (non-autoscaling Spark cluster)
- Ray cluster set up so that one Ray worker has 8 CPUs (one Ray worker per Spark worker)
- Ray cluster is set up to autoscale from 1 to 4 Ray workers
- No resources per trial specified, so should default to 1 CPU per trial if I’m not mistaken
Behavior:
- Expected: When I tune using TuneBOHB without any specified max_concurrency, it should set no limit on concurrency, so it should schedule as many tasks as possible, meaning it should use 4 workers (for 4 trials) at a time. This means if we start the cluster with 1 worker, it should request additional workers until we hit 4 workers.
- Actual: The entire tuning run finishes using only 1 worker the whole time.