Memory not released to default levels: `ray::IDLE` Processes Not Released**

Ray’s ray::IDLE processes not releasing memory after task completion is a known issue, especially under heavy or long-running workloads. This is due to Ray’s worker processes retaining memory allocations even when idle, and the memory is only reclaimed when the worker is killed—typically when the node approaches the RAY_memory_usage_threshold. Tuning parameters like kill_idle_workers_interval_ms or RAY_SERVE_ENABLE_PROXY_GC_OPTIMIZATIONS often has little effect, and manual gc.collect() calls do not guarantee memory release. This behavior is by design and is widely reported, especially when large objects or high concurrency are involved. The issue is exacerbated by object references lingering in idle workers, and Ray’s memory recycling is not aggressive to avoid performance penalties. There is no configuration to force immediate memory release from idle workers without risking premature worker termination and 500 errors under load. See Ray documentation on memory debugging, GitHub issue 52174, and discussions on idle memory.

Regarding performance regression after upgrading, NumpyStore and increased ray.get/put calls can slow down processing, but other factors include changes in Ray’s internal scheduling, memory management, or object store behavior. Network load (e.g., Triton inference) and library versions can also impact performance, but the main cause of persistent memory usage is Ray’s worker and object store management, not your code or most third-party libraries. Unless you can reduce the size or lifetime of objects, or lower parallelism, there is no general workaround except to accept higher baseline memory usage or tune the memory threshold with caution. For more, see Ray memory troubleshooting and related GitHub issues.

Would you like a step-by-step breakdown of how to further diagnose or mitigate this, or more detail on specific Ray parameters?

Sources:

Hint: Mention @RunLLM in the post for followups.