Hey! What is the total amount of resources you want to schedule for a single TorchTrainer
?
Based on your current ScalingConfig
, each Trainer/Trial will request a total of
trainer_resources + num_workers * resources_per_worker
where
trainer_resources=1 CPU
num_workers=2
resources_per_worker=0.5 GPU
So in this case, you’ll be requesting a total of 1 CPU and 1 GPU - does that match the output you’re seeing in the console output?