More efficient way to handle a OOM and free up memory

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I have a workflow where I have around 1000 independent tasks running through a same function and outputting a parquet file.

@ray.remote
def remote_run(i):
# do things
# write parquet

tasks = [i for i in range(1000)]
ray.get(tasks)

When I execute the above, I am doing a ray.init with

ray.init(num_cpus=16, object_store_memory=16 GB)

I still see that my memory usage shoot up and even after adding gc.collect as the last step of the function.

Is there a better way to do this?

Can you clarify a bit? What kind of issue did you see? High memory usage? Fast memory increase? Or OOM?
Are these tasks memory-heavy? Do they share objects or use independent objects (looks like to be the latter?)?

High memory usage. Memory heavy independent objects