Large Scale Hierarchical Clustering With Ray

Hi everyone,

I’m working on a project where I need to run hierarchical agglomerative clustering on between 1 and 10 million documents at a time. I also need to use a custom distance function (I cannot use euclidean space). Does anyone know of any efficient distributed implementations that I might be able to use on top of a ray cluster? Any advice is welcome!
Thank you!

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.
1 Like