Why is Ray spilling objects to disk even though there is enough memory

Hi there,

I’m trying to use an actor pool to load many small CSV files (>100GB) into NumPy arrays and save them in the shared memory object store for later usage. What I’m experiencing is that Ray starts to write these arrays on disk very early even though there is still sufficient memory available. I used –object-store-memory 200000000000 (I do have this amount of RAM) and –plasma-directory "/tmp when starting the cluster. Tweaking these two didn’t help at all. So I’m wondering if there is any way to stop it from automatically spilling over to the disk? or any other setting I might have missed? Thanks

Ray doesn’t write plasma objects to disk unless you use the disk spilling feature now.

Where do you find plasma objects that are spilled to disk?

I’m using v1.1.0 which I suppose doesn’t support spilling yet. I found the folder size of the plasma-directory I specified was growing while loading data. The size of the directory plus the memory used is approximately equal to the size of data I wanted to load into memory. So I believe there was some spillover happening.

I also checked raylet.out as indicated here: Memory Management — Ray v2.0.0.dev0

I got a bunch of these logs:
[2021-01-18 02:00:56,497 W 33270 33280] client_connection.cc:395: [worker]ProcessMessage with type Pla
smaCreateRequest took 308 ms.
[2021-01-18 02:01:08,372 W 33270 33280] client_connection.cc:395: [worker]ProcessMessage with type Pla
smaCreateRequest took 1499 ms.
[2021-01-18 02:01:10,595 W 33270 33280] client_connection.cc:395: [worker]ProcessMessage with type Pla
smaCreateRequest took 112 ms.
[2021-01-18 02:01:19,703 W 33270 33280] client_connection.cc:395: [worker]ProcessMessage with type Pla
smaCreateRequest took 1323 ms.
[2021-01-18 02:01:30,554 W 33270 33280] client_connection.cc:395: [worker]ProcessMessage with type Pla
smaCreateRequest took 302 ms.
[2021-01-18 02:01:46,355 W 33270 33280] client_connection.cc:395: [worker]ProcessMessage with type Pla
smaCreateRequest took 283 ms.
[2021-01-18 02:01:49,412 W 33270 33280] client_connection.cc:395: [worker]ProcessMessage with type Pla
smaCreateRequest took 894 ms.
[2021-01-18 02:02:06,121 W 33270 33280] client_connection.cc:395: [worker]ProcessMessage with type Pla
smaCreateRequest took 123 ms.

hey, to clarify: what’s happening here is that ray is mmaping the object store to a file in /tmp (i.e. using disk as the plasma store and demand paging its contents into memory).

Thanks for the clarification. Is there any way to disable it to use only memory?

yeah you can prevent it by not specifying a plasma directory so that it will use /dev/shm.