I am tuning the hyper-parameters of the ResNet-50 model.
Previously, the maximum GPU consumption with batch_size=256 is around 8 GB running without Ray.
Now, I am running 8 trials on 8 GPUs (1 GPU each trial) but each trial now is consuming 14 GB with batch_size=128, I wonder if there is model-parallelism running under-the-hood causing this GPU consumption or something else.