How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hi team, I’m following this issue [Actor] Possible extra memory consumption · Issue #37291 · ray-project/ray · GitHub. The issue has not been updated for a while. I suppose it might look more like a question of Ray usage so hopefully the discussion forum is a better place to ask for help.
w.r.t. the issue itself, I have found a simpler way to reproduce it:
import ray
import numpy as np
import time
import psutil
class Driver:
def gen(self):
actors = [Actor1.remote() for i in range(5)]
data = ray.get([actor.gen.remote() for actor in actors])
np.sum(data)
@ray.remote
class Actor1:
def gen(self):
return np.random.rand(100000000)
if __name__ == "__main__":
configs = {
"memory_monitor_refresh_ms": 0,
"memory_usage_threshold": 1,
"free_objects_period_milliseconds": 0,
}
ray.init(_system_config=configs)
driver = Driver()
while True:
driver.gen()
# subprocess.run(["ray", "memory"])
print(psutil.Process().memory_info().rss / 1024 / 1024)
time.sleep(1)
"""
Case 1: Output of this script
$ python ray_37291.py
2023-08-01 23:31:53,972 INFO worker.py:1616 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
3927.51953125
3927.52734375
3927.53515625
...
Case 2: Output if L11 (the np.sum statement) was deleted
$ sed '11d' ray_37291.py > ray_37291_altered.py && python ray_37291_altered.py
2023-08-01 23:32:37,463 INFO worker.py:1616 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
113.59375
113.73828125
113.73828125
...
"""
Expected Results
In Case 1, RSS should still go down to ~100MB when driver.gen()
is completed because no one is still holding a reference to those numpy arrays.
It seems like running np.sum
on the objects returned from Actor1
pins those objects in the object store, but ray memory
(by uncommenting the statement in the while loop) suggests that there are no object references to those objects.
======== Object references status: 2023-08-01 23:40:13.832909 ========
Grouping by node address... Sorting by object size... Display allentries per group...
To record callsite information for each ObjectRef created, set env variable RAY_record_ref_creation_sites=1
--- Aggregate object store stats across all nodes ---
Plasma memory usage 0 MiB, 0 objects, 0.0% full, 0.0% needed
Objects consumed by Ray tasks: 762 MiB.
Any insights why we’re seeing this? Many thanks!