What is the benefits to integrate Ray and Spark?

Weitao_Zou · March 27, 2021, 6:53am

Hello,

I am a fresher to Ray, I used to do some works based on Apache Spark. Recently, I found there are some researchers integrating Spark and Ray for distributed machine learning and I am interested in this.

I want to know if the integration of Spark and Ray realized by Spark by opening multiple independent Ray tasks for each Executor, such as the RayonSpark developed by Intel. And what is the benefit for these works？

Can anyone please provide some thoughts on this?

eoakes · March 29, 2021, 2:26pm

The main benefit of these projects is allowing you to easily use Spark as a part of your larger Ray program. For example, you could use Spark to load and featurize data and put it into the Ray object store, then feed it into your ML training pipeline natively using Python code instead of materializing to external storage…

@sangcho or @ericl may also like to chime in here with some more details!

carsonwang · March 31, 2021, 2:11am

There are different options to integrate Ray and Spark. RayonSpark allows you to run Ray program in your existing Spark cluster. Another option is to use RayDP which runs Spark on Ray. RayDP assumes using Ray as the substrate and run Spark and other ML/DL frameworks on top of Ray. It makes it simple to build distributed end-to-end data analytics and AI pipeline by using Spark for data preprocessing and integrating with ML/DL frameworks available on Ray including RaySGD, Horovod, XGBoost, etc with efficient data exchange through Ray’s object store.

Topic		Replies	Views
What is the difference between Ray and Spark?	9	11698	March 5, 2025
Use RayDP on Spark With Kubernetes Deployment	0	256	October 29, 2023
Why is running ray on spark restricted to local / databricks clusters Ray Clusters	0	102	April 30, 2024
Ray is not meant as general ETL tool Ray Core	10	4237	April 13, 2023
Reading Azure blob into spark dataframe using ray remote Ray Data	6	1177	August 25, 2022

What is the benefits to integrate Ray and Spark?

Related topics