Can I specify runtime env in JobSubmissionClient?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

If I want to use JobSubmissionClient to submit the jobs to remote cluster, all of the jobs have the same runtime env. Instead of submit them one by one and thus the runtime env would need to be uploaded everytime (for later time, it would check the env are the same and skip upload, but it still takes lots of time) for hundreds and thousands of task, is there any way I can specify the runtime env for all jobs I am submitting, thus it only checks the env one time?

I don’t see an arg for runtime env in JobSubmissionClient

Maybe this one?

I also could find it from the API doc! Python SDK API Reference — Ray 3.0.0.dev0

Thanks, in your suggested way, if I have 100 jobs to submit, all of them have same runtime env. Then I would need to submit 100 times of runtime env, which takes significant waiting time (though it will skip uploading for the rest of the same 99 runtime env, but it takes time to check). So I am wondering if I can specify the runtime env for ALL the jobs I am going to submit, and just upload the runtime env 1 time for all the 100 jobs?

Hmm how long does it take to “time to check”? As you said, after you upload the first time, it shouldn’t re-upload the runtime env, and usually that should be sufficient for performance. cc @architkulkarni do you have good answer for ^ question?

I think it takes about roughly 5-10s per job. Usually I submit hundreds of jobs, eg 500 jobs, that’s a lot of wait time.

But I think it is running or queueing the job at the backend…so I am not really “waiting” anyway, so on a second thought, maybe this is fine :slight_smile:

Hi @Allie_Yang , sorry you’re running into this issue. The runtime_env should be being cached automatically. Can you share more details about whichruntime_env you’re using? Also, do you observe the 5-10s delay per job only when using runtime_env? If so, are you able to share the output of dashboard_agent.log or checking runtime_env_setup-*.log? These should give more insight into what’s going on. These logs would be located at /tmp/ray/session_latest/logs on the head node.

Hi Archit, I think it is not delay. The job is successfully submitted into the queue. But it will repeatedly print the msg for 99 times out of 100 total job on the terminal which is not needed and make me wait for the printing to finish.

INFO:ray.dashboard.modules.dashboard_sdk:Package gcs:// already exists, skipping upload.

My runtime has 90~ pip packages, a 50MB working-dir.