- Launch a pod in my k8s, with 8c32g
- pip install “ray[default]”==2.3.1
- exec into my pod, and launch ray with:
ray start --head --block --port=6380 --dashboard-host="0.0.0.0"
- Repeat submitting the job 100 times from local laptop with:
seq 100 | xargs -Iz ray job submit --runtime-env-json='{"working_dir": "./"}' -- python3 test.py
my job code:
import ray
ray.init(address='auto')
@ray.remote(num_cpus=0.01)
class MyActor:
def ping(self):
return 100
ACTOR_NUM = 100
l = []
for i in range(ACTOR_NUM):
l.append(MyActor.remote())
for actor in l:
ray.get(actor.ping.remote())
print("Job Done")
- observe the memory(RES) growth of gcs_server, each job grows by 15M, and it never goes down until OOM.