How to use only a single accelerator type when running Ray tune in a Ray cluster?

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

As the title suggests:
Is there a way to specify the accelerator type that Ray Tune is allowed to use when running the trials in a cluster with different types.

Hypothetical Example:
I have 2 nodes with several GPUs installed

A = T4
B = V100

Node 1: [A, A, A, A]
Node 2: [A, A, B, B]

I set up a ray cluster that manages Node 1 and Node 2. After that, I want to start a Ray Tune run that only uses GPUs of type A. Is there a way to specify resources_per_trail in tune.run such that GPUs B are omitted but all 6 GPUs A are used?

I know that this question is very vague but I have only found this [Tune] [SGD] [RLlib] Distribute Training Across Nodes with Different GPUs - #5 by kai until now.

Ray will automatically detect NVIDIA accelerators and create custom resources for them (eg. a node with 1 A100 GPU will have 1.0 of a custom resource accelerator_type:A100). For other type of accelerators, you can specify custom resources yourself manually in the cluster configuration (eg. Cluster YAML Configuration Options — Ray 2.1.0) or if you are creating the cluster manually using ray start in CLI with the --resources argument (Cluster Management CLI — Ray 2.1.0). Do note that if you specify custom resources yourself, Ray will not actually know how to schedule tasks correctly on nodes with multiple types of custom resources mapping to accelerators.

You can then use tune.with_resources or ScalingConfig (if using a Ray AIR Trainer) to request a unit of that custom resource in your trials alongside the CPU and GPU resources. For more information, see Ray Tune FAQ — Ray 2.1.0

@Yard1 thank you very much for your quick response!

I have managed to create a ray cluster with several nodes and also distribute my ray tune trials on all the available devices (I only have NVIDIA accelerators).

So what I need is actually the second part of your answer.

The task I want to accomplish boils down to hyperparameter tuning with ray lightning as described here GitHub - ray-project/ray_lightning: Pytorch Lightning Distributed Accelerators using Ray

from ray import tune

from ray_lightning import RayStrategy
from ray_lightning.examples.ray_ddp_example import MNISTClassifier
from ray_lightning.tune import TuneReportCallback, get_tune_resources

import pytorch_lightning as pl


def train_mnist(config):
    
    # Create your PTL model.
    model = MNISTClassifier(config)

    # Create the Tune Reporting Callback
    metrics = {"loss": "ptl/val_loss", "acc": "ptl/val_accuracy"}
    callbacks = [TuneReportCallback(metrics, on="validation_end")]
    
    trainer = pl.Trainer(
        max_epochs=4,
        callbacks=callbacks,
        strategy=RayStrategy(num_workers=4, use_gpu=False))
    trainer.fit(model)
    
config = {
    "layer_1": tune.choice([32, 64, 128]),
    "layer_2": tune.choice([64, 128, 256]),
    "lr": tune.loguniform(1e-4, 1e-1),
    "batch_size": tune.choice([32, 64, 128]),
}

# Make sure to pass in ``resources_per_trial`` using the ``get_tune_resources`` utility.
analysis = tune.run(
        train_mnist,
        metric="loss",
        mode="min",
        config=config,
        num_samples=2,
        resources_per_trial=get_tune_resources(num_workers=4),
        name="tune_mnist")
        
print("Best hyperparameters found were: ", analysis.best_config)

Since get_tune_resources gives me a PlacementGroup without a specified accelerator I am not sure where to specify it.

Do you have a suggestion of how I should initialize tune.run accordingly? Or do I need to switch to tune.Tuner?

PS.: I am not sure if this is relevant but from the PR history it seems right now that Ray 2.x is not yet working ([experiment] update the ray lightning to 1.7 by JiahaoYao · Pull Request #222 · ray-project/ray_lightning · GitHub)

You’d define the accelerator in resources_per_worker argument in RayStrategy, eg. RayStrategy(num_workers=4, use_gpu=True, resources_per_worker={"accelerator_type:A100": 1})

That way, each of the 4 workers will be scheduled on an A100 GPU (use_gpu=True is also necessary for cuda visible devices to be set correctly).

As for the get_tune_resources, it looks like that argument is missing from there. As a workaround, you could pass in a PlacementGroupFactory directly like this:
resources_per_trial=PlacementGroupFactory([{"CPU": 1}] + [{"GPU": 1, "accelerator_type:A100": 1}] * num_workers). The first element (bundle) is used by the train_mnist function itself, and every other element is used by any child tasks/workers it spawns.

Let me know how that goes! I’ll also check whether ray-lightning can be used with 2.0 or not.

This seems to go in the right direction.

I have modified get_tune_resources from ray_lightning

    def get_tune_resources(
            num_workers: int = 1,
            num_cpus_per_worker: int = 1,
            use_gpu: bool = False,
            # Deprecated args.
            cpus_per_worker: Optional[int] = None,
    ) -> Dict[str, int]:
        """Returns the PlacementGroupFactory to use for Ray Tune."""
        from ray.tune import PlacementGroupFactory

        if cpus_per_worker is not None:
            # TODO(amogkam): Remove `cpus_per_worker` on next major release.
            num_cpus_per_worker = cpus_per_worker
            warnings.warn(
                "`cpus_per_worker` will be deprecated in the "
                "future. Use "
                "`num_cpus_per_worker` instead.", PendingDeprecationWarning)

        head_bundle = {"CPU": 1}
        child_bundle = {"CPU": num_cpus_per_worker, "GPU": int(use_gpu), "accelerator_type:T4": 1}
        child_bundles = [child_bundle.copy() for _ in range(num_workers)]
        bundles = [head_bundle] + child_bundles
        placement_group_factory = PlacementGroupFactory(
            bundles, strategy="PACK")
        return placement_group_factory

and I have also added resources_per_worker={"accelerator_type:T4": 1} to RayStrategy.
However, now it only uses one single T4 to run all the trials. So nothing is run in parallel.
FYI: For now I only want to use 1 GPU per trial but I still instantiate with RayStrategy even though I do not need DDP.

One additional thing. One node is a mixture of Tesla T4 and NVIDIA RTX A6000.
I have checked how the parsing is done and if I am not mistaken, it happens somewhere here ray/resource_spec.py at 404a66188102969b9e6f9a344e5dc010ba10092c · ray-project/ray · GitHub? So the A6000 are actually parsed as RTX from GPU.name.
I have then added it here ray/accelerators.py at 8be5f016afefb2e199fa45416a8c2021e05805e0 · ray-project/ray · GitHub
However, when I specify accelerator_type:RTX I get

Error: No available node types can fulfill resource request {'accelerator_type:RTX': 1.0, 'GPU': 1.0}. Add suitable node types to this cluster to resolve this issue.

I can also move this discussion to an issue if necessary

I see - I think it may actually set just one accelerator resource unit per node. Can you do "accelerator_type:T4": 0.01 instead?

I’ll take a look at the parsing later today, thanks!

Yes, setting "accelerator_type:T4": 0.01 leads to running several jobs in parallel. But it uses again all available GPUs i.e. T4 as well as A6000.

Another thing I noticed. Is it possible that Ray infers the accelerator_type of a single node based on the GPU with id=0? It recognizes that there are two different accelerator_types present but it seems that it assumes both have the same GPU.name

Concerning parsing: Is there a reason why the accelerator names need to be parsed when they are anyway given as enums? I.e why is Tesla T4 reduced to T4

Looking at the code I don’t see if it’s possible to force a task to be run on a specific GPU if a node has two types of those, but I’ll confirm that.

Looking at the code, it seems that indeed only the first GPU will be considered for detection:

def _get_gpu_info_string():
    """Get the gpu type for this machine.

    TODO: Detects maximum one NVidia gpu type on linux

I am not familiar with the parsing code myself, but I’ll see if I can get someone from the Core team to answer.

I followed up with @Alex:

  1. Ray isn’t intended to work with multiple GPU types on the same node and operates under the assumption that each node will only have 1 GPU type.
  2. It is not possible to force a task/actor to be scheduled on a specific GPU.

Can you elaborate a little more on your setup? How did you end up with a node with 2 GPU types?

FYI I have confirmed that ray-lightning works fine with Ray 2.0 and above. The PR concerns updating to the 1.7 pytorch-lightning version.

Thank you very much for the confirmation!

I need to check with our engineers, but I think it was intended as an experiment. Since there were no obvious troubles until now the decision was not revised.

But in any case, would it be hard to add this feature?
Since it is possible to restrict the usage of GPUs on a local cluster with CUDA_VISIBLE_DEVICES I would assume that there may be some way to restrict the usage in such a way.
Sorry for my ignorance of how Ray Cluster works under the hood for multiple nodes

@Alex, could you respond here?