Arrow Flight and Ray Data

Hi,

As part of Arrow project they are building this new protocol called Arrow Flight, are you planning to integrate it with Ray Ecosystem? That way you could make complex transformations with data and have the chance to serve despite of size (efficient and without the need of SerDe additional steps).

1 Like

@Chen_Shen @Clark_Zinzow could one of you answer this question?

@LuisMoralesAlonso No concrete plans yet to integrate with Arrow Flight, since we haven’t encountered any users interested in using Ray + Flight. Would such an integration be useful for your use case?

I would definitely like this!!

@Clark_Zinzow, sorry for the delay in my answer… yes, it would be perfect for us. we are lookin for data intergration layer based on APIs, and we need an efficient one. at this point ray (modin or other high level APIs) + arrow flight would be the perfect match.

Ray has its own implementation of Arrow Flight in the object manager (very similar / predates the flight architecture), so you can get the same advantages of flight when passing objects to Ray tasks and actors.

This wouldn’t cover systems outside of Ray trying to read Dataset data though.

It would be clearly an step in the right direction to allow from outside Ray to consume that flight implementation. And even better as Ray has become a distributed implementation of arrow.

Hi,

Given the rapid growth and adoption around the Apache Arrow ecosystem in the data industry over the past years, has there been any work going on at Ray to provide a more native support for Arrow Flight and to cover external systems trying to read Dataset data?

1 Like