I am using ParallelIterator and I am parsing the data which is available in protobuf format. I am getting the following error:
TypeError: cannot pickle ‘google.protobuf.pyext._message.DescriptorPool’ object
Is there any solution to this problem, as I read that protobuf messages can not be pickled?
Can you show me the script that has the issue? It is highly likely that you have this type of script;
a = protobuf_def
def f():
# in this function you capture a
# and you use f for parallel iterator or sth
f_list = [list of file names]
iter2 = (
ray.util.iter.from_items(f_list, num_shards=20).batch(8000)
.for_each(lambda obj_ref: read_and_parse(obj_ref))
)
def read_and_parse(lst):
for p in lst:
ray.put(read_blob(p))
Hi, I converted protobuf to dict and then I am able to store the object
from google.protobuf.json_format import MessageToDict
dict_obj = MessageToDict(blob)
ray.put(dict_obj)
I think the issue was that when you pass f_list, it couldn’t be serialized because they contain protobuf. Glad you find a way to get around 