There is currently no nice out-of-the box solution in Ray to handle this. Thus the solutions will be custom to your environment.
You can use tune.with_resources to dynamically specify the resources that should be allocated to a trial.
Ray automatically creates device-specific resources:
>>> ray.cluster_resources()
{'object_store_memory': 74558149015.0, 'node:172.31.76.223': 1.0, 'CPU': 36.0, 'memory': 192337857739.0, 'GPU': 4.0, 'accelerator_type:V100': 4.0, 'node:172.31.71.184': 1.0, 'node:172.31.68.10': 1.0, 'node:172.31.72.165': 1.0, 'node:172.31.90.28': 1.0}
Note the 'accelerator_type:V100': 4.0
above (in this cluster this is just one type).
What you could do is to randomly sample one of the accelerators for each trial to use, e.g. like this:
items = [
{"cpu": 8, "gpu": 0.5, "custom_resources": {"accelerator_type:A": 0.5}},
{"cpu": 8, "gpu": 1, "custom_resources": {"accelerator_type:B": 1}},
]
resource_iter = iter(cycle(items))
tuner = tune.Tuner(
tune.with_resources(
trainable=train_fn,
resources=lambda config: random.choice(items)
),
)
tuner.fit()
This is not ideal as we could theoretically always sample the same device and hence not utilize one of the GPUs. With a large number of trials, this shouldn’t be a problem, but for e.g. only 6 trials it wouldn’t be great.