How much heap/obj_store memory to use for a task to download a file?

I’m making a data pipeline to process few millions of relatively small parquet files stored in S3, each file around 20-30 kb in compressed state, and around 20x of that when stored in RAM. Since I know the size of each file beforehand, I want to set up logical memory resources for each download task.

My question is: if my file needs 200mb heap memory, do I also need to reserve that much for obj_store memory, since my understanding is that Ray puts the file there for downstream processing? What happens to the object in heap after it is moved to obj_store, is the memory freed? What is the general rule of thumb here to avoid overallocating memory?

What is the general rule of thumb here to avoid overallocating memory?

Hi @dirtyValera

For your questions:

My question is: if my file needs 200mb heap memory, do I also need to reserve that much for obj_store memory, since my understanding is that Ray puts the file there for downstream processing? …What is the general rule of thumb here to avoid overallocating memory?

You need to reserve at least 200mb object_store memory, but it’s recommended to use the default configuration which is fined tuned for memory-intensive use cases to avoid spilling.

What happens to the object in heap after it is moved to obj_store, is the memory freed?

Yes, once the function is returned the heap memory is freed.