[Dask on Ray] Parallelizing Rasa's DaskGraphRunner - Problem with serializing SQLAlchemy objects

Hi everyone,

I am trying to parallelize Rasa’s DaskGraphRunner class in order to run machine model training on a Ray multi-node cluster.

As per Dask on Ray documentation, I replaced line 101 containing the scheduler dask.get with Ray’s scheduler ray_dask_get but ran into problems with serializing GraphNode class, specifically pickling SQLAlchemy objects:

TypeError: can't pickle sqlalchemy.cprocessors.UnicodeResultProcessor objects
TypeError: Could not serialize the argument <rasa.engine.graph.GraphNode object at 0x7f6bb3840390> for a task or actor ray.util.dask.scheduler.dask_task_wrapper. 
Check https://docs.ray.io/en/master/serialization.html
#troubleshooting for more information.

How should I go on about this?

My guess is that Rasa is including an open database connection in GraphNode, and that connection contains this UnicodeResultProcess c-extension that’s not picklable. And it looks like Rasa hasn’t hit this issue since they’re using the single-threaded synchronous Dask scheduler, as you well know, under which no pickling/serialization needs to take place.

There are a few options here:

  1. Fix the issue in upstream Rasa. This would involve them implementing/using a serializable database connection.
  2. Register a custom serializer with Ray that serializes the connection (or even the entire GraphNode). See option (2) here.
1 Like

Hi @Clark_Zinzow ,

thanks for suggestions. It makes sense that serialization does not need to take place since it’s single-threaded. Before registering and writing the custom serializer (option 2), I need to find where this connection takes place. For that I have the same thread on Rasa’s forum.

1 Like

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.