How to use my own docker image to run a local on-Premise cluster?

The cuda version in the official ray docker image is 10.1, but I need version 10.2 in case there is something wrong with my program. I had tried to use my own image but it failed.

==> /tmp/ray/session_latest/logs/monitor.log <==
2021-11-24 19:19:45,697	INFO autoscaler.py:699 -- StandardAutoscaler: Queue 2 new nodes for launch
2021-11-24 19:19:45,698	INFO node_launcher.py:78 -- NodeLauncher0: Got 2 nodes to launch.
2021-11-24 19:19:45,699	ERROR node_launcher.py:72 -- Launch failed
Traceback (most recent call last):
  File "/root/miniconda3/envs/py36/lib/python3.6/site-packages/ray/autoscaler/_private/node_launcher.py", line 70, in run
    self._launch_node(config, count, node_type)
  File "/root/miniconda3/envs/py36/lib/python3.6/site-packages/ray/autoscaler/_private/node_launcher.py", line 40, in _launch_node
    launch_config = copy.deepcopy(config["worker_nodes"])
KeyError: 'worker_nodes'

ray version: 1.8.0
What feature the image should have so that it can be used as cluster image?

I tried it again and finally I succeeded. Here are some problems that you should deal with when you try to use your own docker image.

1 Keep the following package the same version in your docker and in your host:

python

ray

rsync

2 The “locale” variable in the mirror and on the host machine should be set to the same.

My package version :
ray 1.9.1
python 3.6
rsync 3.12
docker base image:
nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
3 You should install ssh client in your docker images.

FROM nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04

ENV LANG C.UTF-8
ENV LANGUAGE C.UTF-8 
ENV LC_ALL C.UTF-8
ENV TZ Asia/Shanghai
ENV CONDA_HOME /root/miniconda3
ENV CONDA_BIN $CONDA_HOME/bin
ENV PY_BIN $CONDA_HOME/envs/py36/bin
ENV PATH $PY_BIN:$PATH

RUN  sed -i s@/archive.ubuntu.com/@/mirrors.aliyun.com/@g /etc/apt/sources.list
RUN  apt-get clean && apt-get update && \
    apt-get install libsm6 libxrender1 libxext-dev gcc  libmysqlclient-dev -y 

# ffmpeg

# python py36
COPY Miniconda3-4.7.12.1-Linux-x86_64.sh /root/
RUN chmod +x /root/Miniconda3-4.7.12.1-Linux-x86_64.sh && \
    cd /root/ && bash Miniconda3-4.7.12.1-Linux-x86_64.sh -b -p $CONDA_HOME && \
    $CONDA_BIN/conda create -y --name py36 python=3.6 


# 
ENV LD_LIBRARY_PATH /root/miniconda3/envs/py36/lib/:$LD_LIBRARY_PATH 

# pip install python package
RUN  pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple && pip install  requests==2.22.0 kafka-python==1.4.7 opencv-python==4.2.0.32  mysqlclient==2.0.1 DBUtils==2.0.1 aiohttp==3.7.4.post0 aioredis==1.3.1 grpcio==1.37 "ray[default]"==1.9.1

RUN apt-get install rsync  openssh-client -y
1 Like