How to test if tensorflow can see GPU on worker node using Ray?

Hey folks,

Was playing around and trying to get up and running with Tensorflow. I found the following script helpful to make sure everything was up and running and i wanted to share it with the community.

import ray

ray.init()
print(ray.available_resources())
print(ray.cluster_resources())


@ray.remote(num_gpus=1)
def test_tf():
    import tensorflow as tf
    return tf.config.list_physical_devices("GPU")

print(ray.get([test_tf.remote() for x in range(10)]))

my yaml looks like the following:

min_workers: 1
max_workers: 1

docker: 
    image: anyscale/ray-ml:latest-cpu
    head_image: anyscale/ray-ml:latest-cpu
    worker_image: anyscale/ray-ml:latest-gpu
    container_name: ray_container
    pull_before_run: False

head_setup_commands: []

setup_commands:
    - pip install -U ray
    - pip install tensorflow


worker_setup_commands:
    - apt-get install -y libcudnn7=7.6.5.32-1+cuda10.1 libcudnn7-dev=7.6.5.32-1+cuda10.1

idle_timeout_minutes: 5

provider:
    type: aws
    region: us-west-2
    availability_zone: us-west-2a

worker_nodes:
    InstanceType: p2.xlarge

The above worked great for me and allowed me to easily debug my application. Please let us know if it doesn’t work for you!

1 Like

Thanks a lot for the script! The first part worked for me! What type of the output did you get as a result of print(ray.get([test_tf.remote() for x in range(10)]))? I am in the process of debugging, and I get [[], [], [], [], [], [], [], [], [], []].

You should bet some output not just blank lists IIRC. If it’s blank [like your output], that means you can’t see a device!