I wonder why the size of original data is larger than its copy? If it had something to do with numpy array views, then making a change in the original data would reflect on its view. However, this is not the case.
import numpy as np
source = np.zeros(10000)
o_ref = ray.put(source) # make a copy
copy1 = ray.get(o_ref) # get the copy
o_ref2 = foo.remote(o_ref)
copy2 = ray.get(o_ref2) # get another copy
print("size of original data =", sys.getsizeof(source))
print("size of data copy1 =", sys.getsizeof(copy1))
print("size of data copy2 =", sys.getsizeof(copy2))
source = 555
print("original data =", source)
print("data copy1 =", copy1)
print("data copy2 =", copy2)
size of original data = 80112
size of data copy1 = 112
size of data copy2 = 112
original data = [555. 0. 0. ... 0. 0. 0.]
data copy1 = [0. 0. 0. ... 0. 0. 0.]
data copy2 = [0. 0. 0. ... 0. 0. 0.]
in the example, copy1 and copy2 are both outputs from ray.put and @remote respectively, the outputs are stored in the object store, and that is referenced by the consumer (in this example, the script), since the object is stored in the object store, in python it is just a reference, which is what getsizeof is seeing - it cannot see the size of the referenced object.
@ClarenceNg, if you look, for example, at the flags of source and copy1, you will see that copy1 doesn’t own the data. Are you sure that the owner of the data is a Ray object store (plasma or in-process memory)?
I think it is because it is zero-copied. We only copy the metadata into the user space memory when ray.get is called. Also ray objects are immutable, so although you chang the original data, that’s not reflected to the objects in the plasma store