How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
When I run this,
dataset = ray.data.read_csv(
s3_bucket_name+'/'+s3_folder_path_prefix+'/',
partition_filter=partition_filter,
filesystem=s3,
convert_options=convert_options
)
When I materialize this, it works but the read just keeps pausing or proceeds very slowly. But occasionally it goes forward in a burst. I was monitoring this process on top and I see that a lot of the processes are on sleep and only a few are running. Sometimes a lot of them start running and that is when the read progress bar moves forward.
I am using a 64 CPU instance and using all of them to read data. My data read function is very slow because of this. Is this an issue or is there some parameter that should be changed? A simple multiprocessing read function seems to be a lot faster than this.