I am currently on a single node, using Actors to process predictions using a predict method. In this function I batch my data for a set of N actors (with loaded model) to predict on the list of [(filename1, inputs1), (filename2, inputs2)] .
for each prediction I append the results in the same manner. and return
return [(file , ray.put(result)) for (file, result) in zip([filename1, filename2], [y1, y2]])]
This tasks are created inside a function. When the function finished, it appends and collects the results of all predictions from the set of N actors, and returns this collected list of (filenames, object_references) which gets passed to another function to be saved to file in parallel.
I keep on getting:
ObjectLostError: Object c37ebc3b266087738b206d90405dd14436e13cba1400000004000000 is lost due to node failure.
After a lot of debugging, I found that if I return the list of actors outside the function, I did not get the error, which means that the references were somehow tied to the actor.
I also tried wrapping them object references in lists so to keep the reference when I return them out of the function, but no luck.
I am not sure if this is a bug or not, but it seems counterintuitive.
I believe this is the cause of the issue/error posted in this thread: ray.exceptions.ObjectLostError: Object xxx is lost due to node failure