Ray Tune with Pytorch Lightning not recognizing GPU

Hi!

I’m trying to use Ray tune for hyperparameter search. Each model is trained with PTL. Weirdly, I’m getting the following error:

lightning_lite.utilities.exceptions.MisconfigurationException: No supported gpu backend found!

The distributed hparam search works on CPU, and training without Ray works fine on GPU. Is this a compatibility issue between PTL/Ray tune?

Note that I’m using a single GPU for each training job:

    metrics = {"loss": "val_loss"}
    callbacks = [TuneReportCallback(metrics, on="validation_end")] if args.tune else []

    trainer = pl.Trainer(
        devices=1,
        accelerator="gpu",
        precision=64,
        gradient_clip_val=args.grad_clip_val,
        limit_train_batches=1.0,
        log_every_n_steps=10,
        callbacks=callbacks,
    )
    trainer.fit(model, train_loader, val_loader)

Hey @GeoffNN,

You do need to tell Tune that it needs to request GPUs for each trial. Are you setting the right configurations here: A Guide To Parallelism and Resources — Ray 2.2.0?

1 Like

Hi @amogkam! I missed that in the pytorch-lightning Ray tune tutorial.
Thanks for the link – I fixed my code by adding tune.with_resources(train_model, {'cpu':10, 'gpu': 1}):

        tuner = tune.Tuner(
            tune.with_resources(train_model, {'cpu':10, 'gpu': 1}),
            tune_config=tune.TuneConfig(
                metric="loss",
                mode="min",
                num_samples=num_samples,
                scheduler=scheduler,
                max_concurrent_trials=max_concurrent_trials,
            ),
            param_space=config,
            run_config=air.RunConfig(name="tune"),
        )