import ray
ray.init()
@ray.remote
def f(x):
return x * x
futures = [f.remote(i) for i in range(2)]
print(ray.get(futures))
Surprisingly, I get some feedback, but the process doesn’t seem to stop. I am not getting the result I want. What’s wrong with it? Is there some prework I haven’t done?
This is the feedback:
Job submission server address: http://192.168.0.192:8265
2022-08-08 10:08:21,309 INFO dashboard_sdk.py:272 -- Uploading package gcs://_ray_pkg_698a6544fb43c3a9.zip.
2022-08-08 10:08:21,310 INFO packaging.py:479 -- Creating a file package for local directory './'.
-------------------------------------------------------
Job 'raysubmit_YVTpnmAzV7ysFmpk' submitted successfully
-------------------------------------------------------
Next steps
Query the logs of the job:
ray job logs raysubmit_YVTpnmAzV7ysFmpk
Query the status of the job:
ray job status raysubmit_YVTpnmAzV7ysFmpk
Request the job to be stopped:
ray job stop raysubmit_YVTpnmAzV7ysFmpk
Tailing logs until the job exits (disable with --no-wait):
Hi @rickyyx thanks! Yes, it hang and stuck at that feedback.
The output like this:
$ ray job logs raysubmit_SpYXzcr6hVkq973i
Job submission server address: None
2022-08-09 10:55:08,772 INFO dashboard_sdk.py:129 -- No address provided, defaulting to http://localhost:8265.
$ ray job status raysubmit_SpYXzcr6hVkq973i
Job submission server address: None
2022-08-09 10:55:36,855 INFO dashboard_sdk.py:129 -- No address provided, defaulting to http://localhost:8265.
Status for job 'raysubmit_SpYXzcr6hVkq973i': PENDING
Status message: Job has not started yet, likely waiting for the runtime_env to be set up.
And my ray version is 3.0.0.dev0, I have the ssh access to head node.
Moreover, I get some messages from raylet.err and runtime_env_setup-01000000.log:
$ cat raylet.err
bash: 第 0 行: exec: podman:未找到
[2022-08-09 10:26:25,244 E 494944 494944] (raylet) worker_pool.cc:500: Some workers of the worker process(506291) have not registered within the timeout. The process is dead, probably it crashed during start.
$ cat runtime_env_setup-01000000.log
2022-08-09 10:27:25,248 INFO container.py:47 -- start worker in container with prefix: podman run -v /tmp/ray:/tmp/ray --cgroup-manager=cgroupfs --n etwork=host --pid=host --ipc=host --env-host --env RAY_RAYLET_PID=494944 --cap-drop SYS_ADMIN --log-level=debug --entrypoint python anyscale/ray-ml: nightly-py38-cpu
In the past two days, I tried to solve this problem - I installed Podman locally, manually pulled the image before submitting the job, etc. But there is one new issue that is holding me back at the moment:
time="2022-08-11T09:36:46+08:00" level=warning msg="Error validating CNI config file /home/wangjie/.config/cni/net.d/87-podman.conflist: [netplugin failed with no error mess age: fork/exec /opt/cni/bin/bridge: exec format error netplugin failed with no error message: fork/exec /opt/cni/bin/portmap: exec format error netplugin failed with no erro r message: fork/exec /opt/cni/bin/firewall: exec format error netplugin failed with no error message: fork/exec /opt/cni/bin/tuning: exec format error]"
Error: executable file `python` not found in $PATH: No such file or directory: OCI runtime attempted to invoke a command that was not found
I’m sorry to say that the container support is still experimental and has been broken unfortunately
in the latest ray versions. We will enhance this part in few months and make a clear document for it.
Hi @GuyangSong, any update on this? I just tried to start a local Ray cluster via ray.init with a local image (for automated testing), and ran into the same issue:
Error: executable file python not found in $PATH: No such file or directory: OCI runtime attempted to invoke a command that was not found