The the size of original data is larger than its copy

Hi,

I wonder why the size of original data is larger than its copy? If it had something to do with numpy array views, then making a change in the original data would reflect on its view. However, this is not the case.

import ray
import sys
import numpy as np

ray.init()

source = np.zeros(10000)
o_ref = ray.put(source)  # make a copy
copy1 = ray.get(o_ref)  # get the copy

@ray.remote
def foo(data):
    return data

o_ref2 = foo.remote(o_ref)
copy2 = ray.get(o_ref2) # get another copy
print("size of original data =", sys.getsizeof(source))
print("size of data copy1 =", sys.getsizeof(copy1))
print("size of data copy2 =", sys.getsizeof(copy2))

source[0] = 555
print("original data =", source)
print("data copy1 =", copy1)
print("data copy2 =", copy2)

# output
size of original data = 80112
size of data copy1 = 112
size of data copy2 = 112
original data = [555.   0.   0. ...   0.   0.   0.]
data copy1 = [0. 0. 0. ... 0. 0. 0.]
data copy2 = [0. 0. 0. ... 0. 0. 0.]

Thanks in advance.

@yic, @sangcho, any comments/thoughts?

@YarShev

in the example, copy1 and copy2 are both outputs from ray.put and @remote respectively, the outputs are stored in the object store, and that is referenced by the consumer (in this example, the script), since the object is stored in the object store, in python it is just a reference, which is what getsizeof is seeing - it cannot see the size of the referenced object.

@ClarenceNg, if you look, for example, at the flags of source and copy1, you will see that copy1 doesn’t own the data. Are you sure that the owner of the data is a Ray object store (plasma or in-process memory)?

print(source.flags)
#   C_CONTIGUOUS : True
#  F_CONTIGUOUS : True
#   OWNDATA : True
#   WRITEABLE : True
#   ALIGNED : True
#   WRITEBACKIFCOPY : False

print(copy1.flags)
#  C_CONTIGUOUS : True
#   F_CONTIGUOUS : True
#   OWNDATA : False
#   WRITEABLE : False
#  ALIGNED : True
#   WRITEBACKIFCOPY : False

I think it is because it is zero-copied. We only copy the metadata into the user space memory when ray.get is called. Also ray objects are immutable, so although you chang the original data, that’s not reflected to the objects in the plasma store

1 Like