Submitted containerized job is stuck in pending mode

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hi, I want to use ray to submit containerized jobs to a kubernetes cluster. I’ve tried scheduling non-containerized jobs and it works fine. However, once I submit a containerized job, it is stuck in pending mode forever. The command below submits the job successfully, but is stuck forever.

ray job submit --address http://localhost:8265 --runtime-env-json='{"container": {"image": "<my-cuda-docker-image>", "worker_path": "/root"}}' -- nvidia-smi

Job submission server address: http://localhost:8265

-------------------------------------------------------
Job 'raysubmit_KKgyZumhXYm1y3ng' submitted successfully
-------------------------------------------------------

Next steps
  Query the logs of the job:
    ray job logs raysubmit_KKgyZumhXYm1y3ng
  Query the status of the job:
    ray job status raysubmit_KKgyZumhXYm1y3ng
  Request the job to be stopped:
    ray job stop raysubmit_KKgyZumhXYm1y3ng

Tailing logs until the job exits (disable with --no-wait)

Checking the job status confirms this issue.

ray job status raysubmit_KKgyZumhXYm1y3ng --address http://localhost:8265

Status for job 'raysubmit_KKgyZumhXYm1y3ng': PENDING
Status message: Job has not started yet. It may be waiting for the runtime environment to be set up.

Terminating the submitted job also does not work and basically breaks the ray cluster for me.

ray job stop raysubmit_KKgyZumhXYm1y3ng --address http://localhost:8265

Job submission server address: http://localhost:8265
Attempting to stop job 'raysubmit_KKgyZumhXYm1y3ng'
Waiting for job 'raysubmit_KKgyZumhXYm1y3ng' to exit (disable with --no-wait):
Job has not exited yet. Status: PENDING
Job has not exited yet. Status: PENDING
Job has not exited yet. Status: PENDING
Job has not exited yet. Status: PENDING
Job has not exited yet. Status: PENDING
Job has not exited yet. Status: PENDING

Any experiences with this? Could anyone help me with this issue? :slight_smile:

Hi @stn73,

Sorry for the late reply. Currently this feature is experimental so it may have some rough edges. What’s the Ray version you are using?

Hi @jjyao , thanks for responding!

I also opened an issue on github with some additional information, since I believe this to be a bug, see here.

I am using version version 2.5.1 of ray.