All tasks in PENDING_NODE_ASSIGNMENT but workers' CPUs are busy

I am loading data from a gcs bucket path with large number of parquet files. I can see thousands of tasks called “_map_task” in PENDING_NODE_ASSIGNMENT mode, no task in Running state, but most of the workers’ CPU cores are more than 80% busy.

My code is like the following:

import gcsfs
data = ray.data.read_parquet("path", filesystem=gcsfs.GCSFileSystem())  # there are thousands of parquet files in the path
data = data.select_columns(cols=[...])
data.take(10)

How do I find out what the cluster is doing?

You can use the Ray Dashboard to examine which tasks and actors are running, and also get an insight on the current stacktraces of the different processes:

https://docs.ray.io/en/latest/ray-observability/getting-started.html#set-up-dashboard