The cuda version in the official ray docker image is 10.1, but I need version 10.2 in case there is something wrong with my program. I had tried to use my own image but it failed.
==> /tmp/ray/session_latest/logs/monitor.log <==
2021-11-24 19:19:45,697 INFO autoscaler.py:699 -- StandardAutoscaler: Queue 2 new nodes for launch
2021-11-24 19:19:45,698 INFO node_launcher.py:78 -- NodeLauncher0: Got 2 nodes to launch.
2021-11-24 19:19:45,699 ERROR node_launcher.py:72 -- Launch failed
Traceback (most recent call last):
File "/root/miniconda3/envs/py36/lib/python3.6/site-packages/ray/autoscaler/_private/node_launcher.py", line 70, in run
self._launch_node(config, count, node_type)
File "/root/miniconda3/envs/py36/lib/python3.6/site-packages/ray/autoscaler/_private/node_launcher.py", line 40, in _launch_node
launch_config = copy.deepcopy(config["worker_nodes"])
KeyError: 'worker_nodes'
ray version: 1.8.0
What feature the image should have so that it can be used as cluster image?
I tried it again and finally I succeeded. Here are some problems that you should deal with when you try to use your own docker image.
1 Keep the following package the same version in your docker and in your host:
python
ray
rsync
2 The “locale” variable in the mirror and on the host machine should be set to the same.
My package version :
ray 1.9.1
python 3.6
rsync 3.12
docker base image:
nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
3 You should install ssh client in your docker images.
FROM nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
ENV LANG C.UTF-8
ENV LANGUAGE C.UTF-8
ENV LC_ALL C.UTF-8
ENV TZ Asia/Shanghai
ENV CONDA_HOME /root/miniconda3
ENV CONDA_BIN $CONDA_HOME/bin
ENV PY_BIN $CONDA_HOME/envs/py36/bin
ENV PATH $PY_BIN:$PATH
RUN sed -i s@/archive.ubuntu.com/@/mirrors.aliyun.com/@g /etc/apt/sources.list
RUN apt-get clean && apt-get update && \
apt-get install libsm6 libxrender1 libxext-dev gcc libmysqlclient-dev -y
# ffmpeg
# python py36
COPY Miniconda3-4.7.12.1-Linux-x86_64.sh /root/
RUN chmod +x /root/Miniconda3-4.7.12.1-Linux-x86_64.sh && \
cd /root/ && bash Miniconda3-4.7.12.1-Linux-x86_64.sh -b -p $CONDA_HOME && \
$CONDA_BIN/conda create -y --name py36 python=3.6
#
ENV LD_LIBRARY_PATH /root/miniconda3/envs/py36/lib/:$LD_LIBRARY_PATH
# pip install python package
RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple && pip install requests==2.22.0 kafka-python==1.4.7 opencv-python==4.2.0.32 mysqlclient==2.0.1 DBUtils==2.0.1 aiohttp==3.7.4.post0 aioredis==1.3.1 grpcio==1.37 "ray[default]"==1.9.1
RUN apt-get install rsync openssh-client -y
1 Like