Why isn't `ray.data.read_api._get_reader` parallelized?

I am using Ray Data to read 100 Parquet files (each file has ~15MB) on S3 as follows:

start = time.perf_counter()
ds = ray.data.read_parquet("s3://<dir path>")
materialized_ds = ds.materialize()
end = time.perf_counter()

# Plan to do some other data processing afterwards.
# materialized_ds.map_batches(...)

It takes 10+ mins to finish this process. When I checked the traces file from Ray Dashboard, I find that ray.data.read_api._get_reader runs sequentially only a single worker. Can this step be parallelized on multiple workers?

I also tried partitioning the input data as fewer but larger files, manually tuning parallelism, using read_parquet_bulk API instead, but ray.data.read_api._get_reader still shows only being executed on a single worker.