Ray Actor RAM usage keep growing

hi,
I am using Ray 1.3 for scaling up my rl codes, when my code runs several tens of minutes, the RAM of my server will be exhausted. The memory usage keeps growing while my code is runing. The simplified code like this:

import ray
import gc
import numpy as np
import time

ray.init(address="auto")


@ray.remote(memory=1 * 1024 * 1024 * 1024)
class Actor:
    def get_data(self):
        data = np.arange(6400).reshape((64, 100))
        d = {}
        d["data"] = data
        return d

    def get_episode_data(self):
        data = []
        for i in range(100 * 1):
            data.append(self.get_data())
        return data


@ray.remote(memory=2 * 1024 * 1024 * 1024)
class learner:
    def __init__(self):
        self.count = 0
        self.actor = Actor.remote()
        self.data = []

    def get_data(self):
        d = self.actor.get_episode_data.remote()
        return d

    def step(self):
        data = self.get_data()
        self.data.append(data)

    def clear(self):
        self.count += 1
        self.data = []


if __name__ == "__main__":
    lr = learner.remote()
    for i in range(2000000000):
        print(i, "-----------------------------")
        for j in range(20):
            lr.step.remote()

        lr.clear.remote()
        gc.collect()

In the simplified code, I use an actor to produce datas and a learner to get the datas. when I run this code, the RAM is keeping growing. As shown below.

  1. at the begining
    1

  2. a few minutes later
    2

  3. a few minutes later
    3

From my observations, if I slow down the speed of data production, the RAM usage will not keep growing. May be the data production and data getting is so fast that there is no enough time for Python or Ray to collect the garbage.

I still don’t know what the problem is. And how to solve the continuous growth of RAM usage?

Thanks!

Is the object_store_memory section also growing together?

I use ray memory to monitor the object_store_memory,it looks normal. the objects appear and disappear periodically, but the RAM is still keeping growing. and if learner get a lot of data every time , the objects will not disappear in a shot time

Do you think you have a simliar setup as Memory leak from raylet in Ray 1.3 · Issue #16136 · ray-project/ray · GitHub?

Would like to understand if it’s the same kind of regression.

hi, I think this problem is not simliar as the problem in that link. I have tried my code with Ray 1.2, the memory still keep growing. In that link, the problem is not appear in Ray 1.2.

Try adding a ray.get() to prevent an infinite number of calls queueing up on the learner:

ray.get(lr.clear.remote())

hi, @ericl , thanks for the suggestion, I tried this,the RAM is still keeping growing. After the code has run for a few dozen seconds,The result is as follows:
image

the object_store_memory looks normal:

but the RAM is keeping growing, The detailed memory usage of learner shown in the ray dashboard is:

rss:1.01GB
vms:3.05GB
shared:32.57MB
text:2.22MB
lib:0KB
data:1.10GB
dirty:0KB

I want to use Ray to scale up my rl codes, I still can’t get around this problem.

Eager for advice and help.