Hey folks,
Was playing around and trying to get up and running with Tensorflow. I found the following script helpful to make sure everything was up and running and i wanted to share it with the community.
import ray
ray.init()
print(ray.available_resources())
print(ray.cluster_resources())
@ray.remote(num_gpus=1)
def test_tf():
import tensorflow as tf
return tf.config.list_physical_devices("GPU")
print(ray.get([test_tf.remote() for x in range(10)]))
my yaml looks like the following:
min_workers: 1
max_workers: 1
docker:
image: anyscale/ray-ml:latest-cpu
head_image: anyscale/ray-ml:latest-cpu
worker_image: anyscale/ray-ml:latest-gpu
container_name: ray_container
pull_before_run: False
head_setup_commands: []
setup_commands:
- pip install -U ray
- pip install tensorflow
worker_setup_commands:
- apt-get install -y libcudnn7=7.6.5.32-1+cuda10.1 libcudnn7-dev=7.6.5.32-1+cuda10.1
idle_timeout_minutes: 5
provider:
type: aws
region: us-west-2
availability_zone: us-west-2a
worker_nodes:
InstanceType: p2.xlarge
The above worked great for me and allowed me to easily debug my application. Please let us know if it doesn’t work for you!