Azure autoscaler dependency error

Hello,

I am using Ray’s automatic cluster setup within a docker container on Azure. It can successfully set up the head node and starts the ray runtime. However, I get the following error message during autoscaling although azure-common is installed in the conda environment and I am able to import the package in an interactive python shell.

Any suggestions would be much appreciated. Thanks!

  File "/opt/conda/lib/python3.6/site-packages/ray/autoscaler/_private/monitor.py", line 284, in run
    self._initialize_autoscaler()
  File "/opt/conda/lib/python3.6/site-packages/ray/autoscaler/_private/monitor.py", line 133, in _initialize_autoscaler
    event_summarizer=self.event_summarizer)
  File "/opt/conda/lib/python3.6/site-packages/ray/autoscaler/_private/autoscaler.py", line 86, in __init__
    self.reset(errors_fatal=True)
  File "/opt/conda/lib/python3.6/site-packages/ray/autoscaler/_private/autoscaler.py", line 537, in reset
    raise e
  File "/opt/conda/lib/python3.6/site-packages/ray/autoscaler/_private/autoscaler.py", line 495, in reset
    self.config["cluster_name"])
  File "/opt/conda/lib/python3.6/site-packages/ray/autoscaler/_private/providers.py", line 186, in _get_node_provider
    provider_cls = _get_node_provider_cls(provider_config)
  File "/opt/conda/lib/python3.6/site-packages/ray/autoscaler/_private/providers.py", line 162, in _get_node_provider_cls
    return importer(provider_config)
  File "/opt/conda/lib/python3.6/site-packages/ray/autoscaler/_private/providers.py", line 38, in _import_azure
    from ray.autoscaler._private.azure.node_provider import AzureNodeProvider
  File "/opt/conda/lib/python3.6/site-packages/ray/autoscaler/_private/azure/node_provider.py", line 7, in <module>
    from azure.common.client_factory import get_client_from_cli_profile
ModuleNotFoundError: No module named 'azure.common'

Looks like a conda isolation error?

Could you try /opt/conda/bin/python and checking azure.common?

I think you may need to add some installation commands to your setup_commands in your cluster yaml.

Thanks for your reply.

I can successfully import azure.common when I run /opt/conda/bin/python. I also tried installing all azure dependencies by adding in setup_commands but it did not work.

The issue can be reproduced by setting min_workers: 1 in the example-full-legacy.yaml.