Read_binary_files does not load data from S3 in parallel

I tried to test Ray Data for parallel processing of binary data that I store in my S3 bucket. There are lots of files I want to process (pdfs, images, doc files, .ical files, etc.) and their size range from several KBs up to 100-200 MBs.

I wanted to use Ray Data, specifically ray.data.read_binary_files method to load the data and then write by custom Actor to process each file in parallel. The problem however is that this function call seems to start downloading all files that I provide (I provide a list of s3:// paths to each file) in a single process without any parallel approach. No concurrency or any other modification helps.

How do I then load lots of files (petabytes) in parallel from S3?

P.S. I run Ray on my EC2 instance in a conda environment in a Jupyter notebook.
P.S.S. My Ray version is 2.9.3.

import ray

ray.init(num_cpus=16)

paths = ['s3://...', 's3://...', ....]

ds = ray.data.read_binary_files(
    paths,
    include_paths=True
)

it could be because of the auto-detected parallelism is too small. You can set override_num_blocks=N to manually set a lager parallelism.