How to download data?

zengxinliang · December 4, 2025, 4:05pm

RunLLM · December 4, 2025, 4:06pm

Ray Data benchmarks evaluate performance across workloads like image classification, document embedding, audio transcription, video object detection, and large-scale image embedding. Ray Data (v2.50) consistently outperforms Daft (v0.6.2), achieving speedups from 1.2x to 7.1x, with especially notable gains in large-scale image embedding (105.81s for Ray Data vs. 752.75s for Daft) according to the official benchmarks. All results are averaged over four runs with a warmup phase to remove startup overheads, and detailed workload configurations are provided for reproducibility.

Would you like more detail on specific workloads, cluster configurations, or code examples from these benchmarks?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Benchmarks for Ray Data? Ray Data	13	1226	October 5, 2023
Ray Train with Ray datasets (includes images) too slow Ray Data	5	1411	February 14, 2023
About the Ray Data category Ray Data	1	803	April 14, 2025
About the Ray Data LLM APIs category Ray Data LLM APIs	0	62	April 2, 2025
Help us improve Ray Data for ML Training Announcements	0	326	November 13, 2023

How to download data?

Related topics