Autoscaler on K8s import client error

I have a K8s cluster running and trying to deploy rayproject/ray-ml:nightly nodes for both head and workers. Currently when I run ray monitor cluster.yaml I’m seeing the following stack trace after ray up completes

2021-04-01 19:28:44,147 ERROR monitor.py:245 -- Error in monitor loop
Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/monitor.py", line 276, in run
    self._initialize_autoscaler()
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/monitor.py", line 125, in _initialize_autoscaler
    event_summarizer=self.event_summarizer)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/autoscaler.py", line 86, in __init__
    self.reset(errors_fatal=True)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/autoscaler.py", line 521, in reset
    raise e
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/autoscaler.py", line 479, in reset
    self.config["cluster_name"])
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/providers.py", line 186, in _get_node_provider
    provider_cls = _get_node_provider_cls(provider_config)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/providers.py", line 162, in _get_node_provider_cls
    return importer(provider_config)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/providers.py", line 54, in _import_kubernetes
    from ray.autoscaler._private.kubernetes.node_provider import \
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/kubernetes/__init__.py", line 1, in <module>
    import kubernetes
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/kubernetes/__init__.py", line 2, in <module>
    from kubernetes.config.config_exception import ConfigException
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/kubernetes/config.py", line 6, in <module>
    from kubernetes import client
ImportError: cannot import name 'client' from 'kubernetes' (/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/kubernetes/__init__.py)

I’ve connected to the container, run python and was able to import both kubernetes and able to run from kubernetes import client, so I’m fresh out of options what to do next.

The installed kubernetes version is

>>> kubernetes.__version__
'12.0.1'

If I drop back to rayproject/ray-ml:1.3.0 I don’t see this error, but I run into different rllib errors.

Follow up -

Tried going down to rayproject/ray-ml:latest and it doesn’t have the tune.durable that I’m trying to leverage. Next step I’m going to try going to a build from a few days ago.

Edit: Update for tonight FROM rayproject/ray-ml:73fb5d going all the way back to this one works.

@Ameer_Haj_Ali it looks like autoscaler can’t find some library. Maybe due to env issues. Could you have a look at it?

CC @Dmitri , can you please help?

I’m looking into it.

Yep, replicated the error. That’s very odd – will open an issue on github and immediately start working on a fix.

1 Like

Also, I recommend using the Ray Kubernetes Operator – you can use the default rayproject/ray:nightly image in the operator pod (which runs the autoscaler in this setup) and use the rayproject/ray-ml:nightly for the head and workers.

The operator is documented here:
https://docs.ray.io/en/master/cluster/kubernetes.html

Issue tracked here:

Thanks, working through the operator examples and I don’t see a great way to run a more complex workload that has multiple python files importing each other. Is the thinking that we build our own docker containers that extend nightly and put the code we need onto it and submit K8s jobs?

Yep, that’s currently the intended workflow.

There’s work in progress that will allow Ray to automatically handle syncing of modules between nodes – see the discussion here: [autoscaler] [kubernetes] Calling ray down does not remove Kubernetes services · Issue #14700 · ray-project/ray · GitHub .