Memory not released to default levels: `ray::IDLE` Processes Not Released**

Your NumpyStore implementation is close, but it does not guarantee the object will persist if the worker that created the ObjectRef exits before the reference is safely stored in the actor. The critical difference is that you must explicitly call a method on the long-lived actor (e.g., store_actor.store.remote(key, obj_ref)) to store the ObjectRef before the worker exits (Ray Discourse: reference counting assertion error). Simply assigning the ObjectRef to a class variable in NumpyStore is not enough—if the worker dies, the reference count may not be updated correctly, and the object can be lost.

To ensure safety, always:

  1. Use ray.put(data, _owner=actor) to create the ObjectRef.
  2. Immediately call a method on the actor to store the ObjectRef.
  3. Only then allow the worker to exit.

If you skip step 2, or if the worker exits before the actor receives the reference, the object may be lost due to reference counting protocol limitations (GitHub issue 18456).

Would you like a revised NumpyStore pattern that guarantees safety?

Sources:

Hint: Mention @RunLLM in the post for followups.