Garbage collection through cloudpicikle

I did an interesting experiment. Start two servers with GPUs attached. Set one of them to be head node one of them to worker node, etc.

Now on the worker node run the following code:

a = torch.randn(10,1024,1024).cuda()
z = ray.put(a)

Now paste this output to the head node:
a = ray.get(ray.cloudpickle.loads(OUTPUT_FROM_ABOVE))

a is now auto-magically a Torch cuda Tensor. Bravo ray.get!

But now the memory footprint, ~40MB, is on both machines! I have found no way to free this 40MB from either machine. Python del doesn’t work. doesn’t work.

Is it because I am doing something bad with cloudpickle and it breaks Ray’s reference counting?

cc @suquark on this - are you aware of such behaviour from ray cloudpickle?

ray.cloudpickle is a private API in Ray and should only be used for debugging purpose. Some functionalities of Ray would not work properly when using ray.cloudpickle alone.

I think what happened here was ray.cloudpickle.dumps pinned the Ray ObjectRef (via ray._private.serialization.SerializationContext.add_contained_object_ref). When used inside Ray, it will be unpinned later; but here it will be pinned forever because ray.cloudpickle.dumps is used alone.