Ray tasks scheduling troubleshooting

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

From time to time I notice the following case:
My job processed some tasks. There are more tasks left to process, but they are staying in “waiting for scheduling state”, though there are plenty of free workers running.
Usually, the problem appears after cluster(ec2 instances) scaling up.
Here are screenshots, describing my issue:
Note: not all CPUS are used


Note: no running tasks, though there are free cpus.

Can you recommend how to troubleshoot this issue?
Probably this is related to lack of ray virtual resources(num_cpu, memory). How can I check per node resource allocation?
There are plenty of logs. Which log/component is responsible for task scheduling?

Versions:
ray: 2.9.2
Python: 3.10.12

Thank you

Solved issue by installing monitoring. Could be closed