JobSubmission API Taking more time

Hi,

I am using following code to submit a job to ray cluster:

import ray
from datetime import datetime
from ray.job_submission import JobSubmissionClient
client = JobSubmissionClient("http://10.60.60.41:8265")
run_script = 'python3 main.py '
job_id = client.submit_job(entrypoint=run_script, runtime_env={"working_dir":"../src/"})
print(datetime.now().strftime("%Y-%m-%d-%T"), job_id)

Here, after submitting the job, I am printing time and also printing time in first line in main.py. So I have analyzed that it’s taking around 8 seconds after job_submit to execute the main.py file. Even I have check log file (job-driver-raysubmit_*.log), in that also I found 8 seconds gap between timestamp of logs and job_submit time.

I am using cluster.yaml to start cluster and all nodes are on-premises only.

Things checked:

  • In all nodes, time is in sync.

Can anyone please explain why it’s taking this much time?

@shyampatel job submission will do some preparation works before starting to run the actual code.

Things like:

  • create a coordinator Actor
  • uploading the working dir and downloading it
  • preparing the environment.

All these takes time. If you want it to be as fast as possible, maybe you should run it with a driver directly and avoid using runtime env and instead putting the dir into all nodes.

@yic Thanks for your reply. Yes we have tried putting working dir into all nodes to reduce uploading and downloading time. Can you please elaborate the idea of

?

@yic Can you please give suggestion on this?