I’m working on a project where I need to run hierarchical agglomerative clustering on between 1 and 10 million documents at a time. I also need to use a custom distance function (I cannot use euclidean space). Does anyone know of any efficient distributed implementations that I might be able to use on top of a ray cluster? Any advice is welcome!
How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.