Ray tune Multi-tenancy

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.


Right now we provide a ML training platform by deploying a single ray cluster with auto-scaling for multiple users to submit jobs to it, potentially at the same time. That has been working well so far. Now we want to start providing hyper-parameter tuning with ray tune. If multi-tenancy is not supported, do you have any recommended way to handle my use case?
Now we want to start incorporate ray tune to our platform, what is the recommended way to support our use case?

One job could run all its trials at the same time, while the other job waits for a long time until it gets resources to run the first trial.
Can you elaborate on this? In tune, you can also set the resources like this: Ray Tune FAQ — Ray 2.8.0, right?

tuner = tune.Tuner(
        train_fn, resources={"cpu": 2, "gpu": 0.5, "custom_resources": {"hdd": 80}}

or it’s always trying to use all resources available on a cluster?

I’ve read many posts on this topic, but none of them seem to give a concrete answer.

Hey could you explain more about the existing training jobs that you are submitting?

By nature Ray Tune should not be too different from other jobs, but in the practical sense it is not well supported for dealing with resource contention across different jobs. For example, if two separate Tune jobs get scheduled on the same node, your users can run into issues caused by resource contention.

existing training jobs that you are submitting

They are just normal ray tasks and actors.
In what way is Ray Tune different from the normal ray jobs?