Facing Serialization issues

Hi ,

when using Parallel Iterator on a Custom Object, which contains pyarrow data. We are facing Serialization Exceptions. PFA for the screenshot.

Ray version : dev -2.0.0
Python: 3.7.9

When we debug the ray code, inside cloudpickle_fast.py line no 668 , for the following snippet, it always returns Not implemented.

            is_anyclass = issubclass(t, type)
        except TypeError:  # t is not a class (old Boost; see SF #502085)
            is_anyclass = False

        if is_anyclass:
            return _class_reduce(obj)
        elif isinstance(obj, types.FunctionType):
            return self._function_reduce(obj)
            # fallback to save_global, including the Pickler's
            # dispatch_table
            return NotImplemented

Is there any workaround to bypass this issue or a fix can be provided based on evaluation. Any help is appreciated.

cc @suquark Have you faced this error? I remember you mentioned some of Pyarrow type has serialization issues. Is there good workaround for this?

This is the first time I see this error. To me this error indicates the input data contains some Cython objects which are not serializable (Cython objects are not serializable generally because cloudpickle/pickle cannot access the bytecode hidden by Cython). A simple reproducible example would be very helpful. One workaround is to use alternative representations like https://discuss.ray.io/t/cant-pickle-pyarrow-dataset-expression/1685/8; another often useful workaround is to avoid defining custom classes in the entrypoint script (if this was the case).

1 Like

Yes we identified the issue , not passing cython object to ray.

When the error is reported, it would be useful if the error contains the type of the object that caused the serialization issue. Then it will be easier for the caller to identify the issue.