The the size of original data is larger than its copy

YarShev · January 31, 2023, 10:31am

Hi,

I wonder why the size of original data is larger than its copy? If it had something to do with numpy array views, then making a change in the original data would reflect on its view. However, this is not the case.

import ray
import sys
import numpy as np

ray.init()

source = np.zeros(10000)
o_ref = ray.put(source)  # make a copy
copy1 = ray.get(o_ref)  # get the copy

@ray.remote
def foo(data):
    return data

o_ref2 = foo.remote(o_ref)
copy2 = ray.get(o_ref2) # get another copy
print("size of original data =", sys.getsizeof(source))
print("size of data copy1 =", sys.getsizeof(copy1))
print("size of data copy2 =", sys.getsizeof(copy2))

source[0] = 555
print("original data =", source)
print("data copy1 =", copy1)
print("data copy2 =", copy2)

# output
size of original data = 80112
size of data copy1 = 112
size of data copy2 = 112
original data = [555.   0.   0. ...   0.   0.   0.]
data copy1 = [0. 0. 0. ... 0. 0. 0.]
data copy2 = [0. 0. 0. ... 0. 0. 0.]

Thanks in advance.

YarShev · January 31, 2023, 8:53pm

@yic, @sangcho, any comments/thoughts?

ClarenceNg · February 2, 2023, 8:49am

YarShev:

import ray
import sys
import numpy as np

ray.init()

source = np.zeros(10000)
o_ref = ray.put(source)  # make a copy
copy1 = ray.get(o_ref)  # get the copy

@ray.remote
def foo(data):
    return data

o_ref2 = foo.remote(o_ref)
copy2 = ray.get(o_ref2) # get another copy
print("size of original data =", sys.getsizeof(source))
print("size of data copy1 =", sys.getsizeof(copy1))
print("size of data copy2 =", sys.getsizeof(copy2))

source[0] = 555
print("original data =", source)
print("data copy1 =", copy1)
print("data copy2 =", copy2)

@YarShev

in the example, copy1 and copy2 are both outputs from ray.put and @remote respectively, the outputs are stored in the object store, and that is referenced by the consumer (in this example, the script), since the object is stored in the object store, in python it is just a reference, which is what getsizeof is seeing - it cannot see the size of the referenced object.

YarShev · February 4, 2023, 7:46pm

@ClarenceNg, if you look, for example, at the flags of source and copy1, you will see that copy1 doesn’t own the data. Are you sure that the owner of the data is a Ray object store (plasma or in-process memory)?

print(source.flags)
#   C_CONTIGUOUS : True
#  F_CONTIGUOUS : True
#   OWNDATA : True
#   WRITEABLE : True
#   ALIGNED : True
#   WRITEBACKIFCOPY : False

print(copy1.flags)
#  C_CONTIGUOUS : True
#   F_CONTIGUOUS : True
#   OWNDATA : False
#   WRITEABLE : False
#  ALIGNED : True
#   WRITEBACKIFCOPY : False

sangcho · February 7, 2023, 9:26am

I think it is because it is zero-copied. We only copy the metadata into the user space memory when ray.get is called. Also ray objects are immutable, so although you chang the original data, that’s not reflected to the objects in the plasma store

Topic		Replies	Views
How many copies are occurred when getting an object from Plasma Ray Core	9	631	November 2, 2021
Small Ray objects Ray Core	10	214	February 7, 2024
@ray.remote function seemingly copying data from plasma store Ray Core	10	1081	March 27, 2021
Ray dataset creating 2 objects per file read, leading to double memory consumption Ray Data	1	438	May 3, 2022
[Core] How to share memory with non-numpy object? Ray Core	6	1380	May 7, 2021

The the size of original data is larger than its copy

Related topics