Ray Data Performance Issues

Preeti_Joshi · January 19, 2022, 6:32pm

Hi Ray team,

We are trying to

Load a large dataset (4 Billion plus rows)
Shard it across a ray cluster
Perform a task (like executing a query) on each shard
Get the results from each shard
Combine the results

To achieve this we have tried the following so far

Cluster config: 4 nodes (16 cores (CPU) , 64 GB Memory and 1TB of object store)
Stateful Actors for data load attached to a placement group with a STRICT_SPREAD strategy - guaranteeing one worker per shard
Perform execute task (Actor Method) using the same worker
Perform another ray task to collect the results

Questions:

With the following placement_group settings

bundles =[ {"CPU": 1}, {"CPU": 1}, {"CPU": 1}, {"CPU": 1}]
strategy="STRICT_SPREAD"
num_cpus=4

Is this the optimum configuration? How do we utilize all the available cores?
Currently it is not clear whether the above config is able to achieve that.

Will bumping up the CPUs like below in each bundle help?

bundles =[ {"CPU": 2}, {"CPU": 2}, {"CPU": 2}, {"CPU": 2}]
strategy="STRICT_SPREAD"
num_cpus=8

We observe slowness while collecting the results.

@remote

def collect_results:
       # code to collect the results from each shard

How does ray ensure that the task utilizes all available cores on the given worker?

Please review and provide your feedback on this.

Thanks,
Preeti

Preeti_Joshi · January 25, 2022, 12:31pm

@Alex , @Clark_Zinzow , @sangcho can you please provide your feedback/pointers on the performance issues mentioned above? Any guidelines on tuning the performance would be very helpful.

Topic		Replies	Views
Problem with anything on Ray Ray Data	2	611	April 20, 2022
Working with large data that do not fit on the disks of a cluster Ray Data	6	723	July 13, 2022
Troubleshooting Slow Task Execution in Ray Clusters Dashboard, Monitoring & Debugging	1	50	December 27, 2024
Loading dataset once per machine in ray cluster	1	214	December 5, 2023
How to get list of workers in Ray cluster? Ray Clusters	2	964	October 8, 2022

Ray Data Performance Issues

Related topics