How can I configure ray to never run out of memory?

John_Smith · February 28, 2021, 3:23am

I’m using ray as a backend with Modin’s out of core feature. Unfortunately I still see memory error with message it cannot allocate memory to object store. I realized that I’m using a lot of numpy arrays that are using memory but for some reason they aren’t being spilled to disk.

So I think what might be happening is I have a mix of ray code (Modin) and non-ray code numpy and the additional numpy code increases memory pressure and causes ray to run out of memory / not be able to allocate any new objects to the object store at some point.

Is there a way to configure ray to always spill to external storage or disk so the program doesn’t run out of memory?

Digging into documentation I found I could possibly it to s3 bucket but I’d like to know if this is possible or am on the right track.

ray.init(
_system_config={
“max_io_workers”: 4, # More IO workers for remote storage.
“min_spilling_size”: 100 * 1024 * 1024, # Spill at least 100MB at a time.
“object_spilling_config”: json.dumps(
{“type”: “smart_open”, “params”: {“uri”: “s3:///bucket/path”}},
)
},
)

https://docs.ray.io/en/master/memory-management.html#memory-aware-scheduling

sangcho · February 28, 2021, 4:23am

Hi, the disk spilling is more recommended than S3 spilling right now (we need more performance improvement in S3 spilling). You can tried the disk spilling here instead. Memory Management — Ray v2.0.0.dev0

Also, are you in Ray’s public slack channel?

John_Smith · February 28, 2021, 7:57pm

I’m not on public slack no.

I’m still getting ‘cant add object to object store’ errors even with disk spill over is there anything else I can try?

sangcho · March 1, 2021, 4:23am

Hey @devin-petersohn do you know what’s the best practice in Modin now (before we start testing Modin with object spilling)?

Also, @John_Smith would you like to join the public slack and have 1:1 meeting with me? I’d love to see what’s the issue and help you unblocked.

Also, python - Ray object store running out of memory using out of core. How can I configure an external object store like s3 bucket? - Stack Overflow for more detail about object spilling.

devin-petersohn · March 1, 2021, 6:45pm

The current best practice for using Modin (installed from github master) would be to initialize Ray with a large plasma store and have the plasma directory point to disk. That would ensure the object store is larger than memory and then the operating system would page in objects. It’s not as efficient as it could be, but at worst I’ve observed 50-60% slower than pure in-memory performance (despite the 10x overhead of going to disk from memory). Usually this is worth it.

Topic		Replies	Views
Why is Ray spilling objects to disk even though there is enough memory Ray Core	6	944	January 19, 2021
Ray cluster is not spilling memory Ray Clusters	1	130	December 27, 2024
Usage of Disk Space grows due to object spilling Ray Core	4	791	October 12, 2022
Ray storage for spilling Ray Core	3	45	April 29, 2025
Best practice for processing large amounts of data Ray Core	5	1054	April 10, 2022

How can I configure ray to never run out of memory?

Related topics