I have a pipeline where a camera actor continuously acquires images, pre-processes the image, sends it to a GPU Actor, and then subsequently to a post-processing actor (which prints the result to console). At no point in the entire process, I am invoking ray.get
because I don’t want to introduce a blocking call at any step. I am observing that the memory/RAM consumed by the PostProcessActor
slowly increases with time. I see that the plasma store memory roughly remains the same (I am monitoring via the dashboard) but the RAM keeps increasing with time. I am not sure where exactly the leak is. It would be great if I can get some pointers on how to debug this. Could there be a problem if new tasks are constantly created but are never fetched via ray.get
? My pipeline roughly looks like below:
import ray
import numpy as np
@ray.remote
class CameraActor:
def __init__(self, gpu_actor, post_actor):
self.gpu_actor = gpu_actor
self.post_actor = post_actor
def acquire(self):
while True:
cam_img = np.random.randint(0, 255, (3000, 5000)).astype(np.uint8)
pre_img = self.preprocess(cam_img) # of size 3 x 500 x 700
infer_ref = self.gpu_actor.infer.remote(pre_img)
self.post_actor.process.remote(cam_img, infer_ref)
def preproces(self):
pass
@ray.remote
class GPUActor:
def __init__(self):
pass
def infer(self, pre_img):
# do some work on the GPU
return np.zeros((100,100))
@ray.remote
class PostProcessActor:
def __init__(self):
pass
def process(self, cam_img, infer_results):
# do some post processing and arrive at a result
result = ""
print(result)