Why UDF time is larger than Remote wall time with concurrency=1?

kostochkod · May 29, 2025, 10:22am

1. Severity of the issue: (select one)
None: I’m just curious or want clarification.

I try to understand Dataset.stats() output and curious how UDF time is calculated.

From documentation:

UDF time: The UDF time is time spent in functions defined by the user. You can use this stat to track the time spent in functions you define and how much time optimizing those functions could save.

I have this output:

Dataset dataset_20_0 execution finished in 68.75 seconds

...

Operator 6 MapBatches(ExtractSkills): 47 tasks executed, 47 blocks produced in 34.38s
* Remote wall time: 380.7ms min, 759.54ms max, 533.31ms mean, 25.07s total
* Remote cpu time: 381.44ms min, 756.23ms max, 530.33ms mean, 24.93s total
* UDF time: 566.36ms min, 24.74s max, 13.09s mean, 615.34s total
* Peak heap memory usage (MiB): 0.0 min, 0.0 max, 0 mean
* Output num rows per block: 960 min, 1988 max, 1813 mean, 85223 total
* Output size bytes per block: 2143503 min, 5983006 max, 4076711 mean, 191605428 total
* Output rows per task: 960 min, 1988 max, 1813 mean, 47 tasks used
* Tasks per node: 47 min, 47 max, 47 mean; 1 nodes used
* Operator throughput:
	* Ray Data throughput: 2478.793114569806 rows/s
	* Estimated single node throughput: 3400.0273423089598 rows/s

ExtractSkills is an actor used like:

ds = ds.map_batches(
    ExtractSkills,
    batch_format="pandas",
    concurrency=1,
    num_cpus=1,
)

How UDF time is so large, if total time is 68.75 seconds? I could understand how this may happens if I have large concurrency, but I have only one actor worker.

Topic		Replies	Views
Run time of remote function depends on time elapsed since the last remote function call Ray Core	5	452	February 11, 2022
[Data] map_batches is not respecting concurrency from the beginning	1	225	December 6, 2024
CPU cores, CPU threads, and scaling of Ray tasks Ray Core	1	244	June 25, 2024
I need help! It took so long to execute a remote task in the Ray 1.13 when CUDA is involved Ray Core	12	325	August 12, 2022
Long start up time Ray Core	4	212	January 6, 2024

Why UDF time is larger than Remote wall time with concurrency=1?

Related topics