The requested parallelism is too high

awangzy · March 26, 2025, 8:51am

1. Severity of the issue: (select one)
Medium: Significantly affects my productivity but can find a workaround.

2. Environment:

3. What happened vs. what you expected:

Expected: read and process very bit dataset with a raycluster with 30+ pods and 3000+ cpus
Actual: Too many tasks and actors created and queened, and also see the log in console: WARNING util.py:260 – The requested parallelism of 39587 is more than 4x the number of available CPU slots in the cluster of 5700.0. This can lead to slowdowns during the data reading phase due to excessive task creation. Reduce the parallelism to match with the available CPU slots in the cluster, or set parallelism to -1 for Ray Data to automatically determine the parallelism. You can ignore this message if the cluster is expected to autoscale.
my question: how is the parallelism(39587) to be calculated? How to set the parallelism? what api use this parallelism?

Topic		Replies	Views
How to increase parallelism for dataset.count()? Ray Clusters	6	1115	October 26, 2022
Ray Data Performance Issues Ray Data	1	518	January 25, 2022
PENDING_CREATION problem Ray Clusters	4	815	November 14, 2022
Processing performance of tasks Ray Core	14	796	March 8, 2021
Write custom data streamer Ray Data	8	594	November 8, 2022