Bacis tutorial: Using RLLIB with docker

I made this tutorial because I had a lot of problems with dependencies versions, python versions, GPU CUDA for Tensorflow 2 in linux system before I found docker can be the solution.

As the effect of my struggles I built own docker image with dependencies inside:

ray[all]

tensorflow 2 GPU

gym[box2d], gym[atari]

How to use it? In short, your system downloads docker image, creates container, runs code in it and gives the results

If You have no eperience with dockercan think about docker container as:

-some kind of virtual machine with all usefull dependencies inside (in factdocker isn’t VM but ) or

-application with input argument and output directory for results or

-environment for runing python scripts

In the firs phase it’s not important how to understand it but how to use it.

STEP 1

If You want use docker You have to install it:

For Ubuntu type in console:

sudo apt-get update

sudo apt-get install -y docker-ce docker-ce-cli containerd.io

For Windows download and run installer from: Install Docker Desktop on Windows | Docker Docs

STEP 2

Downloading the image, type in console:

docker pull peterpirogtf/ray_tf2

You can try of course with official builds to use or build own image:

docker pull rayproject/ray:latest-gpu

STEP 3

Run docker image for simple rllib example:

docker run -it peterpirogtf/ray_tf2 rllib train --run=PPO --env=CartPole-v0

-it option to communicate with docker by console (input and output)

Options are described here: https://docs.docker.com/engine/reference/commandline/run/

If everything is correct, You can see something like this:

IMPORTANT:
In this configuration your docker container can’t communicate with directories and files outside container and network.

I hope I will write about useful docker run options later.
My docker image build is rather big but the base was tensorflow2-gpu container to avoid GPU problems.

Peter

2 Likes

Thanks so much for this HOWTO tutorial @Peter_Pirog ! This is super useful for anyone interested in running RLlib anywhere inside a docker container.

@sven1977 This is continuation of my tutorial. Simple example:


EXplanation:

If You want use docker, you have to install docker application at first !!!
WINDOWS: Install Docker Desktop on Windows | Docker Docs
UBUNTU: Install Docker Engine on Ubuntu | Docker Docs

comments for docker run options (full description here docker run | Docker Docs):
General:
-it - option to communicate with docker image by console (you need this option to see results of training in console)
–rm - remove container after using, this option delete container with all data inside it but without it you will create a lot of trashes

-d - container works in background (typically container is removed after using) you can log in inside container by command:
“docker ps”
which returns CONTAINER_ID, copy it and use command
“docker attach c92c2f29718e” ← use id of your container, not this !!!

Directories

-v pwd:pwd - mount a volume, current directory of your machine with docker root directory, using of ` sign is important not " nor ’
-w - set working directory inside the container
Network:
–network=host - option to connect container with physical machine network, without this option container use internal interface with address 127.0.0.1, while errors check if ports are busy
-p 8265:8265 - option for expose dashboard port
-p 6379:6379 - option for expose ray head port
–expose=10000-10999 - option to expose workers ports

Assets:

IMPORTANT: You can’t use asset in script, if You don’t declare it for Your docker container !!!
–gpus all - get access for your GPU, but:
Check is CUDA available for Your graphic card model https://www.geforce.co.uk/hardware/technology/cuda/supported-gpus
WINDOWS - needs correct installation and configuration of WSL2 before, it can be problematic
UBUNTU - needs only nvidia drivers and docker image with GPU build inside

    You can check is CUDA available by adding in Your code:

        import tensorflow as tf
        print('Is cuda available:', tf.test.is_gpu_available())

–cpus 8 - number of CPU available for docker. If You declare 4, ray can use only 4 CPU even in physical machine are 8
–shm-size=16g’ - it means get access to 16 GB of RAM

If you want use tensorboard command:
tensorboard --logdir ~/ray_results --bind_all --port 6006’

Version of code to copy/paste (unfortunatelly I have some problem with editing code in post, lost tabs while copying):

import os
file= ‘tune_keras_functional.py’ # IN THIS PLACE PLACE PATH TO YOUR PY FILE !!!

USE ABSOLUTE PATHS

local_result_dir=“/home/peterpirog/ray_results/” # PLACE YOUR USER NAME , PATH IN YOUR PHYSICAL MACHINE
docker_result_dir=‘/root/ray_results/’ #PATH INSIDE DOCKER CONTAINER

docker_cmd=f’sudo docker run’
f’ -it --rm’
f’ -v pwd:pwd
f’ -v {local_result_dir}:{docker_result_dir}’
f’ -w pwd
f’ --network=host’
f’ -p 8265:8265’
f’ -p 6379:6379’
f’ --expose=10000-10999’
f’ --gpus all’
f’ --cpus 8’
f’ --shm-size=16g’
f’ peterpirogtf/ray_tf2:latest’

rllib_cmd=f’python3 {file}’
cmd=docker_cmd + ’ ’ + rllib_cmd
os.system(cmd)

2 Likes

@Peter_Pirog Hi, your docker image works well. That was nice!

Can you share your Dockerfile, I believe that is more important for other people to modify it and fit their own problem.