How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I have a workflow where I have around 1000 independent tasks running through a same function and outputting a parquet file.
@ray.remote
def remote_run(i):
# do things
# write parquet
tasks = [i for i in range(1000)]
ray.get(tasks)
When I execute the above, I am doing a ray.init with
ray.init(num_cpus=16, object_store_memory=16 GB)
I still see that my memory usage shoot up and even after adding gc.collect as the last step of the function.
Is there a better way to do this?