Ray Actor RAM usage keep growing

Zhipeng_Ji · June 8, 2021, 9:05am

hi，
I am using Ray 1.3 for scaling up my rl codes, when my code runs several tens of minutes, the RAM of my server will be exhausted. The memory usage keeps growing while my code is runing. The simplified code like this：

import ray
import gc
import numpy as np
import time

ray.init(address="auto")


@ray.remote(memory=1 * 1024 * 1024 * 1024)
class Actor:
    def get_data(self):
        data = np.arange(6400).reshape((64, 100))
        d = {}
        d["data"] = data
        return d

    def get_episode_data(self):
        data = []
        for i in range(100 * 1):
            data.append(self.get_data())
        return data


@ray.remote(memory=2 * 1024 * 1024 * 1024)
class learner:
    def __init__(self):
        self.count = 0
        self.actor = Actor.remote()
        self.data = []

    def get_data(self):
        d = self.actor.get_episode_data.remote()
        return d

    def step(self):
        data = self.get_data()
        self.data.append(data)

    def clear(self):
        self.count += 1
        self.data = []


if __name__ == "__main__":
    lr = learner.remote()
    for i in range(2000000000):
        print(i, "-----------------------------")
        for j in range(20):
            lr.step.remote()

        lr.clear.remote()
        gc.collect()

In the simplified code, I use an actor to produce datas and a learner to get the datas. when I run this code, the RAM is keeping growing. As shown below.

at the begining
a few minutes later
a few minutes later

From my observations, if I slow down the speed of data production, the RAM usage will not keep growing. May be the data production and data getting is so fast that there is no enough time for Python or Ray to collect the garbage.

I still don’t know what the problem is. And how to solve the continuous growth of RAM usage？

Thanks！

sangcho · June 8, 2021, 9:09am

Is the object_store_memory section also growing together?

Zhipeng_Ji · June 8, 2021, 9:21am

I use ray memory to monitor the object_store_memory，it looks normal. the objects appear and disappear periodically, but the RAM is still keeping growing. and if learner get a lot of data every time , the objects will not disappear in a shot time

sangcho · June 8, 2021, 9:42am

Do you think you have a simliar setup as Memory leak from raylet in Ray 1.3 · Issue #16136 · ray-project/ray · GitHub?

sangcho · June 8, 2021, 9:43am

Would like to understand if it’s the same kind of regression.

Zhipeng_Ji · June 8, 2021, 10:32am

hi, I think this problem is not simliar as the problem in that link. I have tried my code with Ray 1.2, the memory still keep growing. In that link, the problem is not appear in Ray 1.2.

ericl · June 8, 2021, 5:45pm

Try adding a ray.get() to prevent an infinite number of calls queueing up on the learner:

ray.get(lr.clear.remote())

Zhipeng_Ji · June 9, 2021, 2:28am

hi, @ericl , thanks for the suggestion, I tried this，the RAM is still keeping growing. After the code has run for a few dozen seconds，The result is as follows：

the object_store_memory looks normal：

but the RAM is keeping growing, The detailed memory usage of learner shown in the ray dashboard is：

rss:1.01GB
vms:3.05GB
shared:32.57MB
text:2.22MB
lib:0KB
data:1.10GB
dirty:0KB

I want to use Ray to scale up my rl codes, I still can’t get around this problem.

Eager for advice and help.

Topic		Replies	Views
Driver memory increasing indefinitely when returning a numpy array Ray Core	0	268	December 16, 2020
Ray consumes all my RAM Ray Core	5	477	October 18, 2021
How to control the total memory of ray.serve? Ray Serve	3	851	November 10, 2021
Memory leak in ray Actor Ray Core	4	88	December 19, 2024
Memory usage increasing with newest torch version 2.0 above Ray Core	0	16	November 27, 2023

Ray Actor RAM usage keep growing

Related topics