Autoscaler on K8s import client error

wkerr-riot · April 2, 2021, 2:35am

I have a K8s cluster running and trying to deploy rayproject/ray-ml:nightly nodes for both head and workers. Currently when I run ray monitor cluster.yaml I’m seeing the following stack trace after ray up completes

2021-04-01 19:28:44,147 ERROR monitor.py:245 -- Error in monitor loop
Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/monitor.py", line 276, in run
    self._initialize_autoscaler()
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/monitor.py", line 125, in _initialize_autoscaler
    event_summarizer=self.event_summarizer)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/autoscaler.py", line 86, in __init__
    self.reset(errors_fatal=True)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/autoscaler.py", line 521, in reset
    raise e
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/autoscaler.py", line 479, in reset
    self.config["cluster_name"])
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/providers.py", line 186, in _get_node_provider
    provider_cls = _get_node_provider_cls(provider_config)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/providers.py", line 162, in _get_node_provider_cls
    return importer(provider_config)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/providers.py", line 54, in _import_kubernetes
    from ray.autoscaler._private.kubernetes.node_provider import \
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/kubernetes/__init__.py", line 1, in <module>
    import kubernetes
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/kubernetes/__init__.py", line 2, in <module>
    from kubernetes.config.config_exception import ConfigException
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/kubernetes/config.py", line 6, in <module>
    from kubernetes import client
ImportError: cannot import name 'client' from 'kubernetes' (/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/kubernetes/__init__.py)

I’ve connected to the container, run python and was able to import both kubernetes and able to run from kubernetes import client, so I’m fresh out of options what to do next.

The installed kubernetes version is

>>> kubernetes.__version__
'12.0.1'

If I drop back to rayproject/ray-ml:1.3.0 I don’t see this error, but I run into different rllib errors.

wkerr-riot · April 2, 2021, 3:14am

Follow up -

Tried going down to rayproject/ray-ml:latest and it doesn’t have the tune.durable that I’m trying to leverage. Next step I’m going to try going to a build from a few days ago.

Edit: Update for tonight FROM rayproject/ray-ml:73fb5d going all the way back to this one works.

yic · April 2, 2021, 8:06pm

@Ameer_Haj_Ali it looks like autoscaler can’t find some library. Maybe due to env issues. Could you have a look at it?

Ameer_Haj_Ali · April 2, 2021, 8:19pm

CC @Dmitri , can you please help?

Dmitri · April 2, 2021, 9:06pm

I’m looking into it.

Dmitri · April 2, 2021, 9:10pm

Yep, replicated the error. That’s very odd – will open an issue on github and immediately start working on a fix.

Dmitri · April 2, 2021, 9:15pm

Also, I recommend using the Ray Kubernetes Operator – you can use the default rayproject/ray:nightly image in the operator pod (which runs the autoscaler in this setup) and use the rayproject/ray-ml:nightly for the head and workers.

The operator is documented here:
https://docs.ray.io/en/master/cluster/kubernetes.html

Issue tracked here:

wkerr-riot · April 2, 2021, 9:59pm

Thanks, working through the operator examples and I don’t see a great way to run a more complex workload that has multiple python files importing each other. Is the thinking that we build our own docker containers that extend nightly and put the code we need onto it and submit K8s jobs?

Dmitri · April 3, 2021, 3:07am

Yep, that’s currently the intended workflow.

There’s work in progress that will allow Ray to automatically handle syncing of modules between nodes – see the discussion here: [autoscaler] [kubernetes] Calling ray down does not remove Kubernetes services · Issue #14700 · ray-project/ray · GitHub .

Topic		Replies	Views
ModuleNotFoundError for ray.autoscaler._private._kubernetes Kubernetes	0	474	June 22, 2023
Testing autoscaler Kubernetes	15	1527	March 16, 2021
Autoscaler container restarts with requests.exceptions.ConnectionError Kubernetes	1	58	July 28, 2024
Autoscaler failing on minikube Kubernetes	2	525	April 26, 2021
[Autoscaler] Does Ray still use autoscaler when starting the cluster manually? Ray Clusters	9	575	September 25, 2021

Autoscaler on K8s import client error

Related topics