Ray cannot detect GPU on databricks cluster

  • High: It blocks me to complete my task.

I am trying to run ray on databricks for chunking and embedding tasks. The cluster I’m using is:

g4dn.xlarge
1-4 workers with 4-16 cores
1 GPU and 16GB memory

I have set spark.task.resource.gpu.amount to 0.5 currently.

This is how I have setup my ray cluster:

setup_ray_cluster(
  min_worker_nodes=1,
  max_worker_nodes=3,
  num_gpus_head_node=1,
)

And this is the chunking function:

@ray.remote(num_gpus=0.2)
def chunk_udf(row):
    texts = row["content"]
    data = row.copy()
    split_text = splitter.split_text(texts)
    split_text = [text.replace("\n", " ") for text in split_text]
    return list(zip(split_text,data))

When I run the flat_map function for chunking. It throws the following error:

chunked_ds = ds.flat_map(chunk_udf)
chunked_ds.show(5)
At least one of the input arguments for this task could not be computed:
ray.exceptions.RaySystemError: System error: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

Is there something I need to change in my setup?
torch.cuda.is_available() returns True in the notebook.

I have also tried setting spark.task.resource.gpu.amount to 0 but it’s still throws the same error

hello @awoke101 can you share what you see with ray.cluster_resources()