How do I run my experiment on a single GPU?

I’m currently relying on tune.Tuner to run my experiment on a machine that has 28 CPUs and 2 GPUs. Being that I’m not the only one who has access to this machine, I’d like to restrict my experiment to a single GPU.

Despite specifying with_resources(Trainer, {"cpu": 1, "gpu": 1}) both GPUs are used. The only way to avoid this is by setting os.environ["CUDA_VISIBLE_DEVICES"] = "1".

Is there a way to achieve my goal without explicitly setting an environment variable myself? If I understand correctly, according to the documentation this should be taken care of by tune.with_resources:

To leverage GPUs, you must set gpu in tune.with_resources(trainable, resources_per_trial). This will automatically set CUDA_VISIBLE_DEVICES for each trial.

run_config = RunConfig(
    stop={"training_iteration": epochs},
    checkpoint_config=ck_config,
    name=f"{model_name}_{exp_details}",
    local_dir=str(Path(__file__).parent / "ray_checkpoints")
)

tuner = Tuner(
    trainable= with_resources(Trainer, {"cpu": 1, "gpu": 1}),
    run_config=run_config,
    tune_config=TuneConfig(mode="min", metric="val_loss", num_samples=5),
    param_space=configuration,
)

Hi @mtt,

tune.with_resources sets the resources per trial, and since you have 5 trials with 2 GPUs, Tune will schedule 2 trials at a time, each taking one of the GPUs.

You can specify fractional GPUs per trial, as well as limit concurrency so you never go above some GPU usage. For example, this will run 2 trials concurrently on 1 GPU.

tuner = Tuner(
+   trainable= with_resources(Trainer, {"cpu": 1, "gpu": 0.5}),
    run_config=run_config,
    tune_config=TuneConfig(
        mode="min",
        metric="val_loss",
        num_samples=5,
+       max_concurrent_trials=2,
    ),
    param_space=configuration,
)

See Ray Tune FAQ — Ray 2.3.0 for more info.