Hello.
I am under the impression that ray uses Arrow Plasma Object store.
I want to store Pyarrow tables in the Ray object store.
Does ray.put(table) result in a pickle serialization? Or does it only work for numpys?
Thanks
Hello.
I am under the impression that ray uses Arrow Plasma Object store.
I want to store Pyarrow tables in the Ray object store.
Does ray.put(table) result in a pickle serialization? Or does it only work for numpys?
Thanks
Hi @marsupialtail!
I am under the impression that ray uses Arrow Plasma Object store.
Ray actually uses a fork of the Plasma object store that has been customized to Ray’s dataplane needs, located in the Ray repo.
I want to store Pyarrow tables in the Ray object store. Does ray.put(table) result in a pickle serialization? Or does it only work for numpys?
Yes, you can put any Python primitive/object into the object store! Ray uses pickle protocol 5, which will result in zero-copy deserialization for any object that implements the out-of-band buffer protocol. This includes Arrow arrays/tables, NumPy arrays, etc.
So putting that Arrow table into Ray’s object store will result in the underlying column buffers being copied directly into the store (no serialization), and upon reading the table from the store, the underlying columns will reference the column buffers in the object store via shared memory. This gives you zero-copy reads of the table.
I am primarily using this for local communication between different actors. I want to confirm that no matter the size of the object, ray.put will not move part of the object to a remote machine. Is this right?
Also is there an API to figure out how much memory the object store is already taking? I know the official plasma store doesn’t support that