Hello. Thank you in advance for your response.
I am working on a project that uses Ray Tune and I’ve noticed slower performance when GPU drivers are recognized by Ray.
I set up two GCP VMs. They are both machine type n1-standard-32 (32 vCPUs, 120 GB memory) with one of the VMs have 4 x NVIDIA Tesla K80 GPUs. Both VMs have a Ubuntu 18.04 base image.
The NVIDIA drivers I installed are Cuda compilation tools, release 10.1, V10.1.243
and libcudnn.so.7.6.5
I used this script to test each VM. Keeping everything else the same, I changed some of the parameters in tune.run().
analysis = tune.run( train_mnist, name="exp", scheduler=sched, metric="mean_accuracy", mode="max", stop={ "mean_accuracy": 0.99, "training_iteration": num_training_iterations }, num_samples=64, resources_per_trial={ "cpu": 1, "gpu": 0 }, config={ "threads": 2, "lr": tune.uniform(0.09, 0.1), "momentum": tune.uniform(0.8, 0.9), "hidden": tune.randint(32, 64), })
And I also added a timer.
if __name__ == "__main__": import time start = time.time() tune_mnist(num_training_iterations=50) print(time.time() - start)
Using conda, I set up my environment with:
conda create -n ray-env-1_2 tensorflow=2.2 pandas
pip install ray[tune]==1.2
Running the same script in the same environment on each VM resulted in runtimes:
- CPU VM: 83.4s
- GPU VM: 105.7s
I do notice that, despite the conda environment not have a tensorflow that is “GPU compatible”, Ray recognizes the GPUs since it prints out Resources requested: 32/32 CPUs, 0/4 GPUs
I exported each environment’s yaml file and was able to see the environments were exactly the same. I also tried Ray version 1.4 but the runtime was the same.
I then created an environment that could utilize the GPUs. I created a new conda environment with:
conda create -n ray-env-1_2-gpu tensorflow-gpu=2.2 pandas
pip install ray[tune]==1.2
Using the following settings resources_per_trial={"cpu": 1, "gpu": 0}
. New runtime:
- GPU VM: 125.8s
Changing the settings to resources_per_trial={"cpu": 1, "gpu": 1}
. New runtime:
- GPU VM: 344.8s
Noteworthy
- In the conda environment with tensorflow-gpu, I was able to see GPUs with
tf.config.list_physical_devices('GPU')
- If
resources_per_trial={"cpu": 1, "gpu": 0}
, I will see the error messageE tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
. Settingos.environ['CUDA_VISIBLE_DEVICES'] = "0,1,2,3"
does not make a difference to the runtime or the error message. - If
resources_per_trial={"cpu": 1, "gpu": 1}
theCUDA_ERROR_NO_DEVICE
goes away, but the runtime is much slower. - I tried
ray.init(num_gpus=0)
and noticed the GPUs are no longer reported inResources requested: 32/32 CPUs, 0/0 GPUs
, however the runtime is still the same at 125s
I am new to Ray and have been tasked to make performance comparisons between using CPUs and GPUs. However, I can not have confidence in any comparisons I make if there is such a wide range of performance times. Can someone please explain to me what is going on?
I saw here Richard said
setup for the distributed job actually takes quite a long time (25 seconds).
I could theorize that because Ray recognizes the GPUs, it spends additional time distributing jobs. The CPU VM perhaps has better performance because there are no GPUs that would cause Ray to slow down to distribute jobs?
He goes on to say
Obviously, if your training run takes 1 hour, this will not be an issue.
However, I am running experiments for work and I am seeing significant differences that are an issue. Experiments that last 80.5 min using a VM that has GPUs and resources_per_trial={"cpu": 1, "gpu": 0}
versus a VM without any GPUs and a runtime of 54.26 min.
It seems to me that Ray Tune is taking additional steps due to the recognition of GPUs existing. Is there a setting to hide the GPUs so that it may have the same performance as a VM without any GPUs? Is there another way to use Ray Tune and compare a VM with and without GPUs? Any help is appreciated!