How to get gcs server momery distribution to debug memory continued increasement?

Wanxing_Wang · April 20, 2023, 8:27am

Launch a pod in my k8s, with 8c32g
pip install “ray[default]”==2.3.1
exec into my pod, and launch ray with:

ray start --head --block --port=6380 --dashboard-host="0.0.0.0"

Repeat submitting the job 100 times from local laptop with:

seq 100 | xargs -Iz ray job submit --runtime-env-json='{"working_dir": "./"}' -- python3 test.py

my job code:

import ray

ray.init(address='auto')

@ray.remote(num_cpus=0.01)
class MyActor:
    def ping(self):
        return 100

ACTOR_NUM = 100
l = []
for i in range(ACTOR_NUM):
    l.append(MyActor.remote())

for actor in l:
    ray.get(actor.ping.remote())

print("Job Done")

observe the memory(RES) growth of gcs_server, each job grows by 15M, and it never goes down until OOM.

Topic		Replies	Views
[issue] Abnormal memory increase in head node gcs Ray Core	7	656	June 4, 2023
GCS too many open files Ray Core	9	1311	February 5, 2023
Gcs_server takes almost 100% cpu even though there's no running task Ray Core	14	946	June 15, 2022
Ray Actor RAM usage keep growing Ray Core	7	1043	June 9, 2021
Memory leak in ray head Ray Clusters	4	1046	December 16, 2021

How to get gcs server momery distribution to debug memory continued increasement?

Related topics