Last run of a grid search is hanging

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hello,

I created a new environment with Python 3.8, Pytorch 1.12, Ray 1.9.1. For the last couple times, running a grid search results in this issue where the last run just hangs. I repeatedly see the summary table with the parameters and losses but nothing else.
I originally started with an install of Ray Tune 2.0 but I saw the same error with that as well.

The RAM and GPU usage is minimal too.

How can I debug why the last process just stalls? I think it has been stuck in this state for the last few hours.

My previous conda environment with Python 3.6 and Pytorch 1.9, Ray 1.9.1 did not face such issues.

htop shows mostly ray::IDLE as well.

Hey @partially_observed, thanks for raising this issue.

Could you share a reproducible example with us so that we can help you debug this?