[Workflows] S3 storage results in extremely slow workflow scheduling

I’m executing a small sized DAG (about 30 tasks) using workflows with S3 as a storage. The problem is that workflow initialization is extremly slow (see logs timestamps) :

(Scheduler pid=5463) 2023-04-16 08:52:20,301    INFO workflow_access.py:356 -- Initializing workflow manager...
(Scheduler pid=5463) 2023-04-16 08:53:41,355    INFO api.py:203 -- Workflow job created. [id="workflow_0736d415-212d-4d02-bb07-8094740f7f54.1681624333.095297098_0"].
(WorkflowManagementActor pid=5466) 2023-04-16 08:58:28,482      INFO workflow_executor.py:86 -- Workflow job [id=workflow_0736d415-212d-4d02-bb07-8094740f7f54.1681624333.095297098_0] started.
(_workflow_task_executor_remote pid=5465) 2023-04-16 08:58:32,393       INFO task_executor.py:78 -- Task status [RUNNING]       [workflow_0736d415-212d-4d02-bb07-8094740f7f54.1681624333.095297098_0@workflow_0736d415-212d-4d02-bb07-8094740f7f54.1681624333.095297098_0_catalog_df_0_0]

It takes about 5-10 mins for workflow to start executing tasks. With local storage everything is instant.

Ray 2.3.1, tried with pyarrow 8.0.0 and 10.0.1, same results. Ran locally and in kubernetes with minikube, same results

Tracked S3 contents while Ray hangs, looks like it populates duplicate_name_counter/ folder (what’s it for?) in S3 for each task in a DAG. Does it happen serially, separate call for each task? If so, this could be one of the problems. What if a DAG has 100s of tasks, this results in 100 sequential calls to S3?

The above happens at INFO workflow_access.py:356 -- Initializing workflow manager... line, takes about 2 minutes. What happens after INFO api.py:203 -- Workflow job created. ? Why additional 5 mins?