How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I have a parquet file on HDFS that i am trying to read with Ray Dataset. I am going Out of Memory on 64GB - It’s strange because the file on HDFS is just 3GB, but I am thinking that since the columns are saved as string - it might just be exploding in memory. However, the error message I receive is strange:
2022-08-19 08:06:35,929 ERROR worker.py:94 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): e[36mray::_StatsActor.record_task()e[39m (pid=758, ip=10.245.115.127, repr=<ray.data.impl.stats._StatsActor object at 0x7ff8fa191af0>) ray._private.memory_monitor.RayOutOfMemoryError: More than 95% of the memory on node k4aczpfr75r7ctfy is used (59.29 / 59.51 GB). The top 10 memory consumers are: PID MEM COMMAND 652 57.85GiB ray::IDLE 206 0.41GiB /home/cdsw/.conda/envs/ray_env/bin/python -m ipykernel_launcher -f /home/cdsw/.local/share/jupyter/r 461 0.11GiB /home/cdsw/.conda/envs/ray_env/bin/python -u /home/cdsw/.local/lib/python3.9/site-packages/ray/dashb 561 0.1GiB /home/cdsw/.conda/envs/ray_env/bin/python -u /home/cdsw/.local/lib/python3.9/site-packages/ray/dashb 758 0.06GiB ray::_StatsActor 941 0.06GiB ray::IDLE 164 0.06GiB /usr/local/bin/python3.6 /usr/local/bin/jupyter-lab --no-browser --ip=127.0.0.1 --port=8090 --Notebo 446 0.05GiB /home/cdsw/.conda/envs/ray_env/bin/python -u /home/cdsw/.local/lib/python3.9/site-packages/ray/autos 98 0.05GiB /usr/local/bin/python3.6 /var/lib/cdsw/python3-engine-deps/bin/ipython3 kernel --automagic --no-secu 453 0.05GiB /home/cdsw/.conda/envs/ray_env/bin/python -m ray.util.client.server --address=10.245.115.127:6379 -- In addition, up to 0.1 GiB of shared memory is currently being used by the Ray object store.
Shows that the memory is occupied by Ray IDLE? Also:
[2022-08-19 08:02:20,228 E 652 652] plasma_store_provider.cc:132: Failed to put object 32d950ec0ccf9d2affffffffffffffffffffffff0100000001000000 in object store because it is full. Object size is 562617392276 bytes. Plasma store status: (global lru) capacity: 18795524505 (global lru) used: 0% (global lru) num objects: 0 (global lru) num evictions: 0 (global lru) bytes evicted: 0
Object store is full - but at the same time empty?
I am thinking maybe this is just a strange error message but the underlying issue is simply the parquet file exploding in memory due to string types. Any way I can just sample a few records of the parquet file to see how much space they would take? I used
take(10) but it seems the whole file is still loaded in memory (which causes the crash in turn).