Arrow version compatibility

Hi,

I believe that Arrow Plasma is no longer maintained under the Apache Arrow project now. So, I am wondering what is Arrow version supported by Ray currently and is the Plasma project managed internally by the Ray project now?

Ray moved onto the cloudpickle due to some robustness in generic python serialization. That says, Ray currently doesn’t use Arrow for serialization. We might be using arrow for implementing the higher level abstraction though. Plasma store has been back-ported to Ray in order to add new features (such as object spilling) and performance optimization.

So, is Ray using Arrow format still? If so, is there a compatibility version compatibility?

It doesn’t use the Arrow format for serialization I believe. cc @suquark in case I am wrong

Yes, we are not using Arrow format for serialization now.

Does that mean ray switched from “using arrow format” into “not using arrow format”?

is there any reason behind this switch?

This is because

  1. Arrow format has poor support of general python object serialization.
  2. Arrow has deprecated the use of the serializer. See arrow/serialization.pxi at daa5c18e9697a6455a7a75fec19594543c17b21e · apache/arrow · GitHub
  3. Python pickle 5 protocol could support zero-copy serialization as good as arrow. See PEP 574 -- Pickle protocol 5 with out-of-band data | Python.org. So ray is using Python pickle 5 protocol for serialization now.