ray::IDLE still takes a lot of memory

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I started a ray cluster on a local node with 100 cores. The first set of tasks load a lot of data so each of the ray remote processes will likely use >10GB of memory each. After that, only a few follow-up tasks will be started. But when I checked the system using htop, I noticed that those ray::IDLE processes still takes >10GB memory. It sometimes will cause OOM error. It seems that the memory is not released. Is it normal to see such a thing? Is there a setting that forces the worker to release memory?

I cannot post anything from my working computer due to security reasons.

@Daniel_Xie Can you take a look at the Ray Dashboard and look at tasks and memory usage. Double down on tasks and you will get the state of each task to see where it’s IDLE.

cc: @rickyyx

I’m facing a very similar (if not the same) issue;

In my case, I’m trying to trigger a Ray Workflow from Ray Serve. The Workflow itself needs to run on a large node, since it needs to load a heavy model, and run inference with it.

The first couple of Workflow runs work, but at the 3rd/4th try, I start getting OOM, and when looking at the dashboard I see several Ray::IDLE processes, using around 15GB of RAM.