Hi all!
I just recently discovered Ray Tune and would like to use it in combination with PyTorch Lightning for hyperparameter optimization.
So I was very pleased when I found ray_lightning. However, after getting the example on git running on cpu
I wanted to try something bolder and use GPU(s). When I modified the get_tune_ddp_resources
function to use GPUs
...
resources_per_trial=get_tune_ddp_resources(
num_workers=1,
cpus_per_worker=1,
use_gpu=True
),
I got the following warning
WARNING tune.py:506 -- Tune detects GPUs, but no trials are using GPUs. To enable trials to use GPUs, set tune.run(resources_per_trial={'gpu': 1}...) which allows Tune to expose 1 GPU to each trial. You can also override `Trainable.default_resource_request` if using the Trainable API.
In combination with ray_lightning
it seemed that indeed no GPUs were allocated for the trials.
But when I tried to reproduce it with this short dummy setup
def dummy_function(config):
print("VISIBLE: ", os.environ.get("CUDA_VISIBLE_DEVICES", None))
tune.report(minimum=config["value"])
def tune_dummy_placement_group():
config = {"value": tune.loguniform(1e-4, 1e-1)}
analysis = tune.run(
tune.with_parameters(dummy_function),
resources_per_trial=tune.PlacementGroupFactory([{"CPU": 1, "GPU": 1}]),
metric="minimum",
config=config,
mode="min",
num_samples=3
)
def tune_dummy_resource_dict():
config = {"value": tune.loguniform(1e-4, 1e-1)}
analysis = tune.run(
tune.with_parameters(dummy_function),
resources_per_trial={
"cpu": 1,
"gpu": 1
},
metric="minimum",
config=config,
mode="min"
)
I discovered that GPUs ARE exposed. In both cases, I was able to retrieve CUDA_VISIBLE_DEVICES
.
So coming to my questions:
- Why is the warning triggered even though the GPUs seem to be exposed as expected
- Why are they NOT exposed in
ray_lightning