Ray cluster is not spilling memory

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.


Problem: I’m trying to process a parquet data using Modin, on aws ec2 machine using multiple instances using the Ray cluster. After sometime the ray head node would just hangs up. I’d just end up restarting the machine.

Commands to start ray cluster:
ray start --head --system-config=‘{“object_spilling_config”:“{"type":"filesystem","params":{"directory_path":"/tmp/spill"}}”}’

More details: I’m reading a 30 GB parquet file from S3 location. The memory_usage() when the data is loaded into dataframe is ~1000GB.
I’m using 4 P3.16xlarge, 1 P3dn.24xlarge and 4 r5.16xlarge instances, this gives me about 2 TB of object_store_memory.

Am I doing something wrong? I don’t see external storage being used by the spilled memory as it would do on a single node machine