How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I have a parquet file on HDFS that i am trying to read with Ray Dataset. I am going Out of Memory on 64GB - It’s strange because the file on HDFS is just 3GB, but I am thinking that since the columns are saved as string - it might just be exploding in memory. However, the error message I receive is strange:
2022-08-19 08:06:35,929 ERROR worker.py:94 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): e[36mray::_StatsActor.record_task()e[39m (pid=758, ip=10.245.115.127, repr=<ray.data.impl.stats._StatsActor object at 0x7ff8fa191af0>)
ray._private.memory_monitor.RayOutOfMemoryError: More than 95% of the memory on node k4aczpfr75r7ctfy is used (59.29 / 59.51 GB). The top 10 memory consumers are:
PID MEM COMMAND
652 57.85GiB ray::IDLE
206 0.41GiB /home/cdsw/.conda/envs/ray_env/bin/python -m ipykernel_launcher -f /home/cdsw/.local/share/jupyter/r
461 0.11GiB /home/cdsw/.conda/envs/ray_env/bin/python -u /home/cdsw/.local/lib/python3.9/site-packages/ray/dashb
561 0.1GiB /home/cdsw/.conda/envs/ray_env/bin/python -u /home/cdsw/.local/lib/python3.9/site-packages/ray/dashb
758 0.06GiB ray::_StatsActor
941 0.06GiB ray::IDLE
164 0.06GiB /usr/local/bin/python3.6 /usr/local/bin/jupyter-lab --no-browser --ip=127.0.0.1 --port=8090 --Notebo
446 0.05GiB /home/cdsw/.conda/envs/ray_env/bin/python -u /home/cdsw/.local/lib/python3.9/site-packages/ray/autos
98 0.05GiB /usr/local/bin/python3.6 /var/lib/cdsw/python3-engine-deps/bin/ipython3 kernel --automagic --no-secu
453 0.05GiB /home/cdsw/.conda/envs/ray_env/bin/python -m ray.util.client.server --address=10.245.115.127:6379 --
In addition, up to 0.1 GiB of shared memory is currently being used by the Ray object store.
Shows that the memory is occupied by Ray IDLE? Also:
[2022-08-19 08:02:20,228 E 652 652] plasma_store_provider.cc:132: Failed to put object 32d950ec0ccf9d2affffffffffffffffffffffff0100000001000000 in object store because it is full. Object size is 562617392276 bytes.
Plasma store status:
(global lru) capacity: 18795524505
(global lru) used: 0%
(global lru) num objects: 0
(global lru) num evictions: 0
(global lru) bytes evicted: 0
Object store is full - but at the same time empty?
I am thinking maybe this is just a strange error message but the underlying issue is simply the parquet file exploding in memory due to string types. Any way I can just sample a few records of the parquet file to see how much space they would take? I used limi
t and take(10
) but it seems the whole file is still loaded in memory (which causes the crash in turn).