Getting Serialization error with protobuf data

I am using ParallelIterator and I am parsing the data which is available in protobuf format. I am getting the following error:

TypeError: cannot pickle ‘google.protobuf.pyext._message.DescriptorPool’ object

Is there any solution to this problem, as I read that protobuf messages can not be pickled?

Can you show me the script that has the issue? It is highly likely that you have this type of script;

a = protobuf_def

def f():
    # in this function you capture a

# and you use f for parallel iterator or sth
f_list = [list of file names]
iter2 = (
        ray.util.iter.from_items(f_list, num_shards=20).batch(8000)
                .for_each(lambda obj_ref: read_and_parse(obj_ref))
        )
def read_and_parse(lst):
   for p in lst:
      ray.put(read_blob(p))

Hi, I converted protobuf to dict and then I am able to store the object

from google.protobuf.json_format import MessageToDict
dict_obj = MessageToDict(blob)
ray.put(dict_obj)

I think the issue was that when you pass f_list, it couldn’t be serialized because they contain protobuf. Glad you find a way to get around :slight_smile: