Using fractional GPU with TorchTrainer and Tuner API

Hi, I have a job using the TorchTrainer API, that looks something like this:

trainer = TorchTrainer(...)
tuner = Tuner(trainable=trainer, ...)
tuner.fit()

Everything is working as expected, including training with a single worker on GPU. However, I am using only a single GPU, and the models are relatively small such that several could fit on the same GPU. I’d like to know how to allow the tuner to use fractional GPUs, so that I can run multiple concurrent trials at once.

The docs here seem to suggest wrapping the Trainer with tune.with_resources, but this doesn’t work with a Trainer, because Trainer doesn’t inherit from Trainable.

What’s the correct way to specify fractional GPU usage with the Tuner API and TorchTrainer (or Trainer more generally)?

You can specify resource requirements for a Trainer using a ScalingConfig: Configurations User Guide — Ray 2.1.0

In your case, you’d do:
TorchTrainer(..., scaling_config=ScalingConfig(resources_per_worker={"GPU": 0.5})).

Just wanted to say: thank you! This worked :slight_smile: