I am trying to parallelize Rasa’s DaskGraphRunner class in order to run machine model training on a Ray multi-node cluster.
As per Dask on Ray documentation, I replaced line 101 containing the scheduler
dask.get with Ray’s scheduler
ray_dask_get but ran into problems with serializing
GraphNode class, specifically pickling SQLAlchemy objects:
TypeError: can't pickle sqlalchemy.cprocessors.UnicodeResultProcessor objects
TypeError: Could not serialize the argument <rasa.engine.graph.GraphNode object at 0x7f6bb3840390> for a task or actor ray.util.dask.scheduler.dask_task_wrapper.
#troubleshooting for more information.
How should I go on about this?
My guess is that Rasa is including an open database connection in
GraphNode, and that connection contains this
UnicodeResultProcess c-extension that’s not picklable. And it looks like Rasa hasn’t hit this issue since they’re using the single-threaded synchronous Dask scheduler, as you well know, under which no pickling/serialization needs to take place.
There are a few options here:
- Fix the issue in upstream Rasa. This would involve them implementing/using a serializable database connection.
- Register a custom serializer with Ray that serializes the connection (or even the entire
GraphNode). See option (2) here.
Hi @Clark_Zinzow ,
thanks for suggestions. It makes sense that serialization does not need to take place since it’s single-threaded. Before registering and writing the custom serializer (option 2), I need to find where this connection takes place. For that I have the same thread on Rasa’s forum.
This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.