Large Scale Hierarchical Clustering With Ray

Michelle_JanneyCoyle · July 26, 2023, 11:11pm

Hi everyone,

I’m working on a project where I need to run hierarchical agglomerative clustering on between 1 and 10 million documents at a time. I also need to use a custom distance function (I cannot use euclidean space). Does anyone know of any efficient distributed implementations that I might be able to use on top of a ray cluster? Any advice is welcome!
Thank you!

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Topic		Replies	Views
Need Help with Scaling Up My Ray Cluster Ray Clusters	0	23	July 31, 2024
RAY Cluster Benchmarking	0	10	April 28, 2025
Guidance Needed: Best Practices for Scaling Ray Workloads in a Hybrid Cluster Setup	0	13	May 28, 2025
Want advice on Improving Ray for Long Machine Learning Model Training	1	65	July 13, 2024
Anyone using Ray in Complex Applications	1	426	November 18, 2021

Large Scale Hierarchical Clustering With Ray

Related topics