Opposite of ray.put()

Hi,

I was dealing with a function that requires large files to be used multiple times so I used the .put() to store them but as I moved to even larger data, I got error: Detected 1 oom-kill event(s) in StepId=49664180.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler. which I am assuming is because I have already stored a lot of objects in memory. I am now trying to selectively store my data and would it be possible to reverse .put() so that the data I don’t need anymore doesn’t take up space?

Ray has automatic reference counting. So, if you remove all references to the object, the object should be automatically evicted (eagerly by the best effort) from the object store. For example;

reference = ray.put(big_object)
del reference # No more reference -> Ray will delete the object

Also note that you can see the objects in the object store through ray memory CLI command.

For more details; Memory Management — Ray v2.0.0.dev0

@sangcho Thanks. This worked but the program run time seems to increase with each iteration and I am not sure why.

for x, y in tqdm(dataset):
	if x not in done:
		if holder != None:
			del holder
		holder = ray.put(data_formatted[x])
		done.append(x)
	ids.append(single_query.remote(x, holder, y, data_formatted[y]))

This is my implementation for selectively storing my data while earlier I only used ray.get([single_query.remote(x, data_formatted[x], y, data_formatted[y]) for x, y in tqdm(dataset)]). Is there any obvious thing I am doing wrong here?

increase with each iteration

To be clear, does that mean each iteration time gets longer, or the new approach is slower than before?