How severe does this issue affect your experience of using Ray?
- None: Just asking a question out of curiosity
- Low: It annoys or frustrates me for a moment.
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
High: It blocks me to complete my task.
I tried reading multiple text files using ray’s dataset api and noticed that for every file read, ray created 2 objects with their individual size equal to the input file size. However, this is not the case when I am putting a numpy array to memory using ray.put. I am attaching my code as well as the corresponding ray memory output below:
import ray
from ray.internal.internal_api import memory_summary
ray.init(address="auto")
# currently has 3 files of 850 MiB each
dir_path = "input/files/path"
ds = ray.data.read_text(dir_path)
print(ds.count())
print(memory_summary())
ray.shutdown()
The output of the above code is as follows:
1532898
Grouping by node address... Sorting by object size... Display allentries per group...
--- Summary for node address: 127.0.0.1 ---
Mem Used by Objects Local References Pinned Pending Tasks Captured in Objects Actor Handles
5365140705.0 B 6, (5365140705.0 B) 0, (0.0 B) 0, (0.0 B) 0, (0.0 B) 2, (0.0 B)
--- Object references for node address: 127.0.0.1 ---
IP Address PID Type Call Site Size Reference Type Object Ref
127.0.0.1 61351 Worker disabled ? ACTOR_HANDLE ffffffffffffffff54c910c96952127076bc5c6d0100000001000000
127.0.0.1 61347 Driver (actor call) | /opt/a ? ACTOR_HANDLE ffffffffffffffff54c910c96952127076bc5c6d0100000001000000
naconda3/envs/snappy-f
ix/lib/python3.7/site-
packages/ray/data/impl
/stats.py:get_or_creat
e_stats_actor:122 | /o
pt/anaconda3/envs/snap
py-fix/lib/python3.7/s
ite-packages/ray/data/
read_api.py:read_datas
ource:226 | /opt/anaco
nda3/envs/snappy-fix/l
ib/python3.7/site-pack
ages/ray/data/read_api
.py:read_binary_files:
607
127.0.0.1 61347 Driver (task call) | /opt/an 889484831.0 B LOCAL_REFERENCE f4402ec78d3a2607ffffffffffffffffffffffff0100000001000000
aconda3/envs/snappy-fi
x/lib/python3.7/site-p
ackages/ray/data/read_
api.py:<lambda>:275 |
/opt/anaconda3/envs/sn
appy-fix/lib/python3.7
/site-packages/ray/dat
a/impl/lazy_block_list
.py:_get_or_compute:14
9 | /opt/anaconda3/env
s/snappy-fix/lib/pytho
n3.7/site-packages/ray
/data/impl/lazy_block_
list.py:__next__:133
127.0.0.1 61347 Driver (task call) | /opt/an 890501944.0 B LOCAL_REFERENCE 8849b62d89cb30f9ffffffffffffffffffffffff0100000001000000
aconda3/envs/snappy-fi
x/lib/python3.7/site-p
ackages/ray/data/impl/
compute.py:<listcomp>:
85 | /opt/anaconda3/en
vs/snappy-fix/lib/pyth
on3.7/site-packages/ra
y/data/impl/compute.py
:apply:85 | /opt/anaco
nda3/envs/snappy-fix/l
ib/python3.7/site-pack
ages/ray/data/impl/pla
n.py:__call__:285
127.0.0.1 61347 Driver (task call) | /opt/an 893689159.0 B LOCAL_REFERENCE e0dc174c83599034ffffffffffffffffffffffff0100000001000000
aconda3/envs/snappy-fi
x/lib/python3.7/site-p
ackages/ray/data/read_
api.py:<lambda>:275 |
/opt/anaconda3/envs/sn
appy-fix/lib/python3.7
/site-packages/ray/dat
a/impl/lazy_block_list
.py:_get_or_compute:14
9 | /opt/anaconda3/env
s/snappy-fix/lib/pytho
n3.7/site-packages/ray
/data/impl/lazy_block_
list.py:__next__:133
127.0.0.1 61347 Driver (task call) | /opt/an 894711144.0 B LOCAL_REFERENCE 82891771158d68c1ffffffffffffffffffffffff0100000001000000
aconda3/envs/snappy-fi
x/lib/python3.7/site-p
ackages/ray/data/impl/
compute.py:<listcomp>:
85 | /opt/anaconda3/en
vs/snappy-fix/lib/pyth
on3.7/site-packages/ra
y/data/impl/compute.py
:apply:85 | /opt/anaco
nda3/envs/snappy-fix/l
ib/python3.7/site-pack
ages/ray/data/impl/pla
n.py:__call__:285
127.0.0.1 61347 Driver (task call) | /opt/an 897863466.0 B LOCAL_REFERENCE 32d950ec0ccf9d2affffffffffffffffffffffff0100000001000000
aconda3/envs/snappy-fi
x/lib/python3.7/site-p
ackages/ray/data/read_
api.py:<lambda>:275 |
/opt/anaconda3/envs/sn
appy-fix/lib/python3.7
/site-packages/ray/dat
a/impl/lazy_block_list
.py:__init__:41 | /opt
/anaconda3/envs/snappy
-fix/lib/python3.7/sit
e-packages/ray/data/re
ad_api.py:read_datasou
rce:279
127.0.0.1 61347 Driver (task call) | /opt/an 898890161.0 B LOCAL_REFERENCE f91b78d7db9a6593ffffffffffffffffffffffff0100000001000000
aconda3/envs/snappy-fi
x/lib/python3.7/site-p
ackages/ray/data/impl/
compute.py:<listcomp>:
85 | /opt/anaconda3/en
vs/snappy-fix/lib/pyth
on3.7/site-packages/ra
y/data/impl/compute.py
:apply:85 | /opt/anaco
nda3/envs/snappy-fix/l
ib/python3.7/site-pack
ages/ray/data/impl/pla
n.py:__call__:285
To record callsite information for each ObjectRef created, set env variable RAY_record_ref_creation_sites=1
--- Aggregate object store stats across all nodes ---
Plasma memory usage 2550 MiB, 3 objects, 124.55% full, 41.47% needed
Plasma filesystem mmap usage: 1701 MiB
Spilled 4267 MiB, 5 objects, avg write throughput 644 MiB/s
Objects consumed by Ray tasks: 2556 MiB.
Ray version is 2.0.0 (nightly) but this is being observed in ray 1.9.0 as well.
The object store memory is set to 2 GiB (max for Mac), each input file is approx 850 MiB in size and there are 3 input files in the aforementioned scenario.
Any idea why this is taking place. Is this due to some sort of replication within ray? If so, how can we disable it?