Memory not released to default levels: `ray::IDLE` Processes Not Released**

RunLLM · November 12, 2025, 5:27pm

Here’s a revised NumpyStore pattern that guarantees safety by ensuring the ObjectRef is stored in a long-lived, detached actor before the worker exits. This prevents ReferenceCountingAssertionError and object loss, even with max_calls or autoscaling (Ray Discourse: reference counting assertion error, GitHub issue 18456):

import ray
import numpy as np

# Detached actor to persistently own and store ObjectRefs
@ray.remote
class PlasmaStoreActor:
    def __init__(self):
        self.refs = {}

    def store(self, key, obj_ref):
        self.refs[key] = obj_ref

    def retrieve(self, key):
        return self.refs.get(key)

# Start Ray and the actor (ensure it's detached and persistent)
ray.init()
store_actor = PlasmaStoreActor.options(
    name="PlasmaStoreActor", namespace="PlasmaStore", lifetime="detached", get_if_exists=True
).remote()

class NumpyStore:
    def __init__(self, data: np.ndarray, key: str):
        if not isinstance(data, np.ndarray):
            raise TypeError(type(data))
        self.shape = data.shape
        self.size = data.size
        self.ndim = data.ndim
        self.dtype = data.dtype
        self.length = len(data)
        # Put the array in the object store, owned by the actor
        self.data_ref = ray.put(data, _owner=store_actor)
        # Store the ObjectRef in the actor before the worker exits
        ray.get(store_actor.store.remote(key, self.data_ref))
        self.key = key

    def retrieve(self) -> np.ndarray:
        # Retrieve the ObjectRef from the actor and get the data
        obj_ref = ray.get(store_actor.retrieve.remote(self.key))
        return ray.get(obj_ref)

    def __len__(self) -> int:
        return self.length

# Example usage in a remote function
@ray.remote(max_calls=2)
def generate_and_store_array(key, size):
    arr = np.random.random(size)
    store = NumpyStore(arr, key)
    return key

# Usage
key = "my_array"
size = (100, 100)
ray.get(generate_and_store_array.remote(key, size))

# Retrieve the array later
store = NumpyStore(np.empty((0,)), key)  # Dummy data, just to use retrieve
arr = store.retrieve()
print(arr.shape)  # (100, 100)

Key points:

The ObjectRef is stored in the actor before the worker exits.
Retrieval always goes through the actor, ensuring the object is not lost if the original worker is gone.

Let me know if you want a breakdown of each step or further explanation!

Sources:

GitHub issue 18456

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
How to control the total memory of ray.serve? Ray Serve	3	886	November 10, 2021
Leaking worker memory Ray Core	9	486	February 19, 2021
RayOutOfMemoryError: More than 95% of the memory is used Ray Core	6	4967	September 9, 2022
Memory (RAM) not being released by Ray Ray Core	17	2204	August 26, 2022
Object store spilling terabytes of data Ray Core	6	2582	January 11, 2023

Memory not released to default levels: `ray::IDLE` Processes Not Released**

Related topics