When using Ray autoscaling, ray::IDLE processes can “leak” (i.e., persist and retain memory) because autoscaler-created workers may not be killed immediately after becoming idle, especially if the cluster is not scaled down aggressively. Even with max_calls=2, autoscaler logic may keep idle workers alive to avoid frequent worker startup costs, and these idle workers can retain memory from previous tasks (Ray autoscaler docs, Ray Discourse: idle workers not releasing resources). Without autoscale, Ray is more likely to kill idle workers promptly, but with autoscale, the cluster manager may keep them around in anticipation of new work, leading to memory not being released.
This is expected Ray behavior and is not directly controlled by max_calls. Autoscaler policies and idle worker thresholds determine when idle workers are terminated, and these may be tuned, but aggressive downscaling can cause cold start latency for new tasks.
Would you like more detail on tuning autoscaler or worker cleanup parameters?
Sources:
Hint: Mention @RunLLM in the post for followups.