RAM does not get released while no more references pointed to the Object Store

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I’m running into a memory management issue while starting to use Ray. Below is the code to reproduce the issue. In the VSCode’s debug mode, after running the second last line, although the big_data_object_ref no longer shows up in “ray memory”, the RAM usage I read from “htop” still does not decrease.

Here are the results of “ray memory” and “htop” before executing the line “del big_data_object_ref”:

======== Object references status: 2023-01-19 22:17:50.509337 ========
Grouping by node address...        Sorting by object size...        Display allentries per group...


--- Summary for node address: 128.100.153.166 ---
Mem Used by Objects  Local References  Pinned        Used by task   Captured in Objects  Actor Handles
1076480259.0 B       1, (1076480259.0 B)  0, (0.0 B)    0, (0.0 B)     0, (0.0 B)           0, (0.0 B)

--- Object references for node address: 128.100.153.166 ---
IP Address       PID    Type    Call Site               Status          Size    Reference Type      Object Ref
128.100.153.166  191212  Driver  disabled                FINISHED        1076480259.0 B  LOCAL_REFERENCE     00ffffffffffffffffffffffffffffffffffffff0100000001000000

To record callsite information for each ObjectRef created, set env variable RAY_record_ref_creation_sites=1

--- Aggregate object store stats across all nodes ---
Plasma memory usage 1026 MiB, 1 objects, 1.56% full, 1.56% needed
Objects consumed by Ray tasks: 4106 MiB.

And here are the results after the line “del big_data_object_ref”:

======== Object references status: 2023-01-19 22:18:40.908331 ========
Grouping by node address...        Sorting by object size...        Display allentries per group...


To record callsite information for each ObjectRef created, set env variable RAY_record_ref_creation_sites=1

--- Aggregate object store stats across all nodes ---
Plasma memory usage 0 MiB, 0 objects, 0.0% full, 0.0% needed
Objects consumed by Ray tasks: 4106 MiB.

Mem usage is still 21.3G

Could anyone help me with this? Am I getting something wrong?

import ray
import time
import numpy as np


if not ray.is_initialized():
    ray.init(num_cpus=6, include_dashboard=False)
    
@ray.remote
def my_function(big_data_object, x):
    time.sleep(1)
    return big_data_object[0,0]+x

big_data_object = np.random.rand(11600,11600)  # Define an object of approx 1 GB.
big_data_object_ref = ray.put(big_data_object)

result_refs = []
for item in range(4):
    result_refs.append(my_function.remote(big_data_object_ref, item))

import copy
results = copy.deepcopy(ray.get(result_refs))
del result_refs
del big_data_object_ref
print(results)

Ray object store is started with a configured size so it will always use that much memory regardless if currently it has objects or not. In other words, one an Object is GCed, object store memory will not be released, it’s still reserved for future objects.