I tried to install azure-common manually (via pip), but it is already installed.
I get the same error if I try to call ray.init(address=‘auto’) in a python script.
Furthermore I noticed that this setup only creates a head node but no worker nodes as described in the tutorial.
How can I get ray status and init to work? Is the problem with the worker node potentially liked to this issue?
Hi, there have been a few updates since that post was made. The latest example_full.yaml actually deploys a docker image to the nodes and doesn’t use the base linux environment of the dsvm image (so those conda envs don’t exist). I’ve created a PR here which updates a few things in the yamls and the azure node provider to work with changes to azure sdk functions.
Please let me know if the example_full.yaml used there works for you.
As for the worker nodes, they are deployed from the head node, so if something was failing there that could explain why there are no worker nodes. Also, the default minimum number of workers is 0 so it will only deploy workers once there are processes requiring the autoscaler to scale the cluster up.
You can force a certain number of workers using the min_workers property in the yaml file:
I have kept a close eye on the github issue you mentioned and now that it was merged, installed the latest (nightly) wheel in the cloud shell and also on the head node. However, my problem persists. Curiously, I still get this error, even though I can verify that the folder has been renamed from azure to _azure.
I have tried to uninstall and reinstall ray several times also in a separate conda environment and tried building it from the git repository, however the error I receive does not change.
I have also tried to force the creation of a worker with min_workers : 1 but no worker was created.
What else could I try?
Update:
ray status is working now.
Using the newest version of the example yaml file, I changed the imageVersion to latest and the docker image to image: “rayproject/ray-ml:nightly-py37-gpu”
Thank you