I am considering using Ray for a new ML-data project and have used Flink in the past. For this project the team are heavy Python users (but more ML scientists than strong engineers). In the past I have used Flink-Scala, but I want to use a Python interface for this one. I have found a lot of comparisons of Ray with PySpark and Flink with Spark, but no Ray to PyFlink comparison.
For anyone familiar with both or has made similar decisions what were your deciding factors and learnings?
Generally, it seems like Flink’s watermark and replay abilities are the main distinguishing factors. Has anyone ever compared performance?