How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I have been testing with Ray for the past few months to determine its viability for our company to speed up and scale Python development and processing. I am using Ray as a standalone tool and testing locally as well as on a remote, single-node, cluster (a server).
I am starting out with what I think should be simple tasks to submit, parallelize, and return transformations via an input ray.remote function, but I have run into problems as my datasets grow larger. Available memory continues to fill up and then objects are spilled. No error is created but I need to prevent spilling and to limit memory usage. The solution that makes sense to me is to process the tasks in chunks so that Ray doesn’t see so many objects at once. Unfortunately, I can’t find a way to separate the returned Python objects from their “pinned” status in Ray so once the chuck is processed, the object is stuck in Ray until it is deleted from the Python space. This kind of defeats the purpose for me as I am trying to aggregate the results in the Python space so they can be viewed or used or stored.
My question is: what is the expected workflow to take care of this?
Should I be appending all results to a file and then deleting the Python object?
Should I be adding all values to a separate database and then deleting the Python object?
Can Ray[data] play a role to solve this?
Here is my current, simplified usage flow:
- Spawn n actors where n=num_cpus
- Add all the actors to a
ray.util.ActorPool() - Submit chunks of the task list to the pool using
pool.map_unordered()orpool.submit() - Return results in the chunks they were submitted
[self.results.append(x) for x in results]
or eagerly as they are completed
while self.pool.has_next():
self.results.append(self.pool.get_next_unordered())
depending on how they were submitted
5. Kill all of the actors in the pool using ray.kill()
I suspect that I have additional memory problems as ray seems to hold a bunch of memory even after the job is completed and ray memory shows no memory being used. The memory is released again only when I stop the ray cluster.
Any feedback or help would be useful as I am quite new to this. I’m happy to provide more info if needed.