How severe does this issue affect your experience of using Ray?
- None: Just asking a question out of curiosity
Hi there, can someone explain to me whats going on here? I’ve read that " Ray supports zero-copy-read for numpy (for float or integer types) array."
but I’m not sure if this is the same as my phenomenon.
I have self play workers fulfilling roll outs and returning numpy arrays. in the main thread these results are Object refs as expected. I then pass these to a replay buffer using an add method, but i notice that theres significant delay between when the workers finish, and when the buffer adds them. This lead me to consider whether the arrays is being passed to the buffer rather than the reference and now i’m somewhat confused about whats going on here. Could someone explain?
Here’s the code:
ps = Policy_Server.remote(DEVICE)
sp = self_play_worker.options(max_concurrency=MAX_WORKERS).remote(ps, DEPTH)
buffer = ReplayBuffer.options(max_concurrency=2).remote(capacity=CAPACITY)
sp_refs = [sp.self_play.remote() for _ in range(GAMES_TOTAL)]
buffer_refs = []
start_time = time.time()
# ? Preload the ReplayBuffer
print(f"Number of Jobs Running: {len(sp_refs)=}")
while sp_refs:
complete, sp_refs = ray.wait(sp_refs)
print(f"{complete=}")
buffer.add.remote(complete)
In the above, i get the following thus far and this is what i expected.
Number of Jobs Running: len(sp_refs)=1
complete=[ObjectRef(f91b78d7db9a659302e1adadcf69844dfe0a8bc60100000001000000)]
the buffers code however is:
def add(self, experience: ray.ObjectRef):
print(f"{experience} - add function")
print(f"{type(experience)=}")
print(f"{type(experience[0])}")
data = ray.get(experience)
print(data)
print(type(data))
assert False
This gives the following print out:
ValueError: Invalid type of object refs, <class 'numpy.ndarray'>, is given. 'object_refs' must either be an ObjectRef or a list of ObjectRefs.
(ReplayBuffer pid=527599) [ObjectRef(f91b78d7db9a659302e1adadcf69844dfe0a8bc60100000001000000)] - add function
(ReplayBuffer pid=527599) type(experience)=<class 'list'>
(ReplayBuffer pid=527599) <class 'ray._raylet.ObjectRef'>
(ReplayBuffer pid=527599) [array([[1.000e+00, 1.000e+00, 1.000e+00, ..., 0.000e+00, 1.829e+03,
(ReplayBuffer pid=527599) 0.000e+00],
(ReplayBuffer pid=527599) [1.000e+00, 1.000e+00, 1.000e+00, ..., 0.000e+00, 1.837e+03,
(ReplayBuffer pid=527599) 0.000e+00],
(ReplayBuffer pid=527599) [1.000e+00, 1.000e+00, 1.000e+00, ..., 0.000e+00, 2.997e+03,
(ReplayBuffer pid=527599) 0.000e+00],
(ReplayBuffer pid=527599) ...,
(ReplayBuffer pid=527599) [0.000e+00, 0.000e+00, 0.000e+00, ..., 0.000e+00, 4.105e+03,
(ReplayBuffer pid=527599) 0.000e+00],
(ReplayBuffer pid=527599) [0.000e+00, 0.000e+00, 0.000e+00, ..., 0.000e+00, 3.510e+03,
(ReplayBuffer pid=527599) 0.000e+00],
(ReplayBuffer pid=527599) [0.000e+00, 0.000e+00, 0.000e+00, ..., 0.000e+00, 2.358e+03,
(ReplayBuffer pid=527599) 0.000e+00]])]
(ReplayBuffer pid=527599) <class 'list'>
(ReplayBuffer pid=527599) [[1.000e+00 1.000e+00 1.000e+00 ... 0.000e+00 1.829e+03 0.000e+00]
(ReplayBuffer pid=527599) [1.000e+00 1.000e+00 1.000e+00 ... 0.000e+00 1.837e+03 0.000e+00]
(ReplayBuffer pid=527599) [1.000e+00 1.000e+00 1.000e+00 ... 0.000e+00 2.997e+03 0.000e+00]
(ReplayBuffer pid=527599) ...
(ReplayBuffer pid=527599) [0.000e+00 0.000e+00 0.000e+00 ... 0.000e+00 4.105e+03 0.000e+00]
(ReplayBuffer pid=527599) [0.000e+00 0.000e+00 0.000e+00 ... 0.000e+00 3.510e+03 0.000e+00]
(ReplayBuffer pid=527599) [0.000e+00 0.000e+00 0.000e+00 ... 0.000e+00 2.358e+03 0.000e+00]] - add function
(ReplayBuffer pid=527599) type(experience)=<class 'numpy.ndarray'>
(ReplayBuffer pid=527599) <class 'numpy.ndarray'>
I mean it looks like it at least gets to the buffer as an object reference, but i dont understand why it doesnt stay that, given that it seems to be called twice. once as a Object Reference, and once again as a numpy array. (without ray.get() being called)