How to test if tensorflow can see GPU on worker node using Ray?

bill-anyscale · December 3, 2020, 12:14am

Hey folks,

Was playing around and trying to get up and running with Tensorflow. I found the following script helpful to make sure everything was up and running and i wanted to share it with the community.

import ray

ray.init()
print(ray.available_resources())
print(ray.cluster_resources())


@ray.remote(num_gpus=1)
def test_tf():
    import tensorflow as tf
    return tf.config.list_physical_devices("GPU")

print(ray.get([test_tf.remote() for x in range(10)]))

my yaml looks like the following:

min_workers: 1
max_workers: 1

docker: 
    image: anyscale/ray-ml:latest-cpu
    head_image: anyscale/ray-ml:latest-cpu
    worker_image: anyscale/ray-ml:latest-gpu
    container_name: ray_container
    pull_before_run: False

head_setup_commands: []

setup_commands:
    - pip install -U ray
    - pip install tensorflow


worker_setup_commands:
    - apt-get install -y libcudnn7=7.6.5.32-1+cuda10.1 libcudnn7-dev=7.6.5.32-1+cuda10.1

idle_timeout_minutes: 5

provider:
    type: aws
    region: us-west-2
    availability_zone: us-west-2a

worker_nodes:
    InstanceType: p2.xlarge

The above worked great for me and allowed me to easily debug my application. Please let us know if it doesn’t work for you!

gregoruar · January 19, 2021, 4:18pm

Thanks a lot for the script! The first part worked for me! What type of the output did you get as a result of print(ray.get([test_tf.remote() for x in range(10)]))? I am in the process of debugging, and I get [[], [], [], [], [], [], [], [], [], []].

bill-anyscale · January 19, 2021, 5:19pm

You should bet some output not just blank lists IIRC. If it’s blank [like your output], that means you can’t see a device!

Topic		Replies	Views
How to wait for GPU memory to be released when using TensorFlow in a ray remote function Ray Core	1	199	January 25, 2024
Can't work with pytorch's gpu tensor	0	25	September 20, 2024
Ray tune with environment using GPU RLlib	2	841	February 8, 2021
Intentionally not using GPU Ray Core	3	399	February 9, 2022
Testing Ray Cluster via Manual Setup	6	880	January 22, 2021

How to test if tensorflow can see GPU on worker node using Ray?

Related topics