Ray core scheduling vs Tune Scheduler

I assume ray.tun.run converts my trainable function to remote task if so what happens to scheduling strategy for remote task ( like default or spread)? Does it get overridden by Tune scheduler (e.g. FIFO, PBT or ASHA)?
I really want to force Tune Schedulers like PBT to use “SPREAD” for all trails so that I am sure that all trails are running in parallel.

Good question. I’ll forward that to the Ray AIR team,
cc: @kai @amogkam

1 Like

PBT or ASHA are hyper-parameter searching algorithms. Ray core does not use them for task/actor scheduling.
Whether your trials can be run in parallel is mainly decided by how much resources you have in your cluster.
I do think we use SPREAD for scheduling trials.
Also note that, if your trials are able to use the entire cluster’s resources, there would not be much different between SPREAD and PACK scheduling strategy. Ray will make sure all available resources are properly used.

@gjoliver thanks for the reply. As I understand ray uses “default” strategy by default which is considerably different from SPREAD so are you saying when PBT used it will automatically change to SPREAD?

can you point me to which default policy you are talking about?

Please check this page for scheduling strategies: Scheduling — Ray 2.3.0

First few lines from the “DEFAULT” strategy:

“DEFAULT”

"DEFAULT" is the default strategy used by Ray. Ray schedules tasks or actors onto a group of the top k nodes.

I understand Ray Core has the concept of scheduling strategy.
I am curious if any of our Ray Tune documentation tells our users to worry about core scheduling strategy when configuring their trials.

Ohh dear, you are just trying to shutdown a very relevant question. Do you mean to say people should only talk about what is written in docs but not if it is confusing?
If ray is not able to spread all trials to available resources, there is a question. My work need to spread all trials but as per docs, Tune scheduler and ray core scheduler does not seems to be in sync or at least not explained.

sorry for not being clear. let me clarify my answers:

  1. It shouldn’t matter whether core is scheduling tasks and actors in PACK or SPREAD mode. These are best-effort modes. Tune should be able to utilize all available resources in your cluster to fit as many trials as possible.
  2. Note that Tune schedules trials with placement groups. And placement groups do have modes like STRICT_PACK and STRICK_SPREAD. If somehow you have configured your Tune run to use those strategies, it’s possible Core would refuse to schedule some tasks. But as long as you are not doing something like the following:
        tuner = tune.Tuner(
            tune.with_resources(
                train,
                resources=tune.PlacementGroupFactory([
                    {"CPU": 1, "GPU": 0.5, "custom_resource": 2},
                    {"CPU": 2},
                    {"CPU": 2},
                ], strategy="STRICT_PACK")  # or STRICT_SPREAD
            )
        )
        tuner.fit()

You should be fine.
3. I kept asking for references because I genuinely wanted to know if we have issues with our documentation that caused your confusion. Normally Tune users shouldn’t need to care about these Core primitives.
So if you can think of any documentations that led you into these questions, please share so we can improve on them.

Hope this helps.

I am not sure if you got chance to check the DEFAULT core strategy explained on that page:

“Ray selects the best node for scheduling by randomly picking from the top k nodes with the lowest scores. The value of k is the max of (number of nodes in the cluster * RAY_scheduler_top_k_fraction environment variable) and RAY_scheduler_top_k_absolute environment variable. By default, it’s 20% of the total number of nodes.”

From this it seems like ray is not going to utilize all the available CPU (just top K and default is 20%) in the cluster. But the PBT scheduler says it will schedule all trials in parallel so wonder how is that going to work because core may or may not let it.
This behavior I noticed while testing PBT on large cluster: many CPUs (with enough ram) were free and at the same time trails were pending so I was trying to figure this issue. My Ray cluster had like 20 Pods with 10 CPUs each.

am I interpreting it wrong?