Ray Tune with Pytorch Lightning not recognizing GPU

GeoffNN · December 21, 2022, 1:42am

Hi!

I’m trying to use Ray tune for hyperparameter search. Each model is trained with PTL. Weirdly, I’m getting the following error:

lightning_lite.utilities.exceptions.MisconfigurationException: No supported gpu backend found!

The distributed hparam search works on CPU, and training without Ray works fine on GPU. Is this a compatibility issue between PTL/Ray tune?

Note that I’m using a single GPU for each training job:

    metrics = {"loss": "val_loss"}
    callbacks = [TuneReportCallback(metrics, on="validation_end")] if args.tune else []

    trainer = pl.Trainer(
        devices=1,
        accelerator="gpu",
        precision=64,
        gradient_clip_val=args.grad_clip_val,
        limit_train_batches=1.0,
        log_every_n_steps=10,
        callbacks=callbacks,
    )
    trainer.fit(model, train_loader, val_loader)

amogkam · December 21, 2022, 11:55pm

Hey @GeoffNN,

You do need to tell Tune that it needs to request GPUs for each trial. Are you setting the right configurations here: A Guide To Parallelism and Resources — Ray 2.2.0?

GeoffNN · December 22, 2022, 7:22pm

Hi @amogkam! I missed that in the pytorch-lightning Ray tune tutorial.
Thanks for the link – I fixed my code by adding tune.with_resources(train_model, {'cpu':10, 'gpu': 1}):

        tuner = tune.Tuner(
            tune.with_resources(train_model, {'cpu':10, 'gpu': 1}),
            tune_config=tune.TuneConfig(
                metric="loss",
                mode="min",
                num_samples=num_samples,
                scheduler=scheduler,
                max_concurrent_trials=max_concurrent_trials,
            ),
            param_space=config,
            run_config=air.RunConfig(name="tune"),
        )

Topic		Replies	Views
"Tune detects GPUs" warning trigger even though GPU is requested in resources_per_trial Ray Tune	1	720	September 2, 2021
Multi-gpu ray tune for hparams not parallelizing and only using first gpu	0	78	July 10, 2024
Ray Train/Tune issue: concurrent trials conflict on GPU nodes Ray Tune	2	43	February 12, 2025
Distributed Training & Distributed Tuning using Ray Tune, PLT, Ray Lightning Ray Clusters	1	375	April 25, 2022
Pytorch Lightning Trainable API Compatibility Ray Tune	2	345	February 11, 2021

Ray Tune with Pytorch Lightning not recognizing GPU

Related topics