I was dealing with a function that requires large files to be used multiple times so I used the .put() to store them but as I moved to even larger data, I got error: Detected 1 oom-kill event(s) in StepId=49664180.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler. which I am assuming is because I have already stored a lot of objects in memory. I am now trying to selectively store my data and would it be possible to reverse .put() so that the data I don’t need anymore doesn’t take up space?
Ray has automatic reference counting. So, if you remove all references to the object, the object should be automatically evicted (eagerly by the best effort) from the object store. For example;
reference = ray.put(big_object)
del reference # No more reference -> Ray will delete the object
Also note that you can see the objects in the object store through ray memory CLI command.
@sangcho Thanks. This worked but the program run time seems to increase with each iteration and I am not sure why.
for x, y in tqdm(dataset):
if x not in done:
if holder != None:
del holder
holder = ray.put(data_formatted[x])
done.append(x)
ids.append(single_query.remote(x, holder, y, data_formatted[y]))
This is my implementation for selectively storing my data while earlier I only used ray.get([single_query.remote(x, data_formatted[x], y, data_formatted[y]) for x, y in tqdm(dataset)]). Is there any obvious thing I am doing wrong here?