I add the following code at the end of JobSupervisor#__init__
return_cmd = subprocess.run(self._entrypoint, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, executable="/bin/bash")
self._logger.info(return_cmd)
if return_cmd.returncode == 0:
self._logger.info("!!!!!!!!")
ret_val = return_cmd.stdout
else:
self._logger.info("###########")
The log in worker-xxx.err is (returncode=0)
2025-04-01 11:21:10,786 INFO job_supervisor.py:139 -- CompletedProcess(args='python image_recognition_single_bak.py', returncode=0, stdout=b'', stderr=b'2025-04-01 11:21:10,123\tINFO worker.py:1494 -- Using address 127.0.0.1:6379 set in the environment variable RAY_ADDRESS\n2025-04-01 11:21:10,123\tINFO worker.py:1634 -- Connecting to existing Ray cluster at address: 10.7.8.75:6379...\n2025-04-01 11:21:10,130\tINFO worker.py:1810 -- Connected to Ray cluster. View the dashboard at \x1b[1m\x1b[32m10.7.8.75:8265 \x1b[39m\x1b[22m\n[2025-04-01 11:21:10,137 I 334426 334426] logging.cc:293: Set ray log level from environment variable RAY_BACKEND_LOG_LEVEL to -1\n') job_id=02000000 worker_id=db1839d26b8be781bfb40444e5937c9c8b18f94c6e17a7490ce0998f node_id=d3bf92cf84d0278ffada4c21fed52e68e7fced5a7e5ecb08b496dcc7 actor_id=f3f56488d615bd2d398ad98b02000000 task_id=fffffffffffffffff3f56488d615bd2d398ad98b02000000
2025-04-01 11:21:10,787 INFO job_supervisor.py:141 -- !!!!!!!! job_id=02000000 worker_id=db1839d26b8be781bfb40444e5937c9c8b18f94c6e17a7490ce0998f node_id=d3bf92cf84d0278ffada4c21fed52e68e7fced5a7e5ecb08b496dcc7 actor_id=f3f56488d615bd2d398ad98b02000000 task_id=fffffffffffffffff3f56488d615bd2d398ad98b02000000
And I make a new python file written the subprocess run code described above, the result is
CompletedProcess(args='python image_recognition_single_bak.py', returncode=1, stdout='', stderr='2025-04-01 11:49:59,368\tINFO worker.py:1494 -- Using address 127.0.0.1:6379 set in the environment variable RAY_ADDRESS\n2025-04-01 11:49:59,368\tINFO worker.py:1634 -- Connecting to existing Ray cluster at address: 10.7.8.75:6379...\n2025-04-01 11:49:59,377\tINFO worker.py:1810 -- Connected to Ray cluster. View the dashboard at \x1b[1m\x1b[32m10.7.8.75:8265 \x1b[39m\x1b[22m\n[2025-04-01 11:49:59,383 I 342378 342378] logging.cc:293: Set ray log level from environment variable RAY_BACKEND_LOG_LEVEL to -1\n')
###########
The image_recognition_single_bak.py
content is simple:
sys.exit(1)
# or raise xxx
# or import not_exist_module