Deprecating ray-ml images

github issue: Deprecating ray-ml images · Issue #46378 · ray-project/ray · GitHub

Hi Ray community,

We are deprecating rayproject/ray-ml container images.

Starting from Ray version 2.31.0 :

  • The images will be tagged with .deprecated prefixes (such as 2.31.0.deprecated- )
  • We will stop updating the latest tags.
  • We will not build ray-ml container images for Python 3.12
  • We might remove Python packages from future ray-ml (deprecated) images without notice.
  • We will try to keep building it as much as we can, but we might stop publishing ray-ml (deprecated) images without notice.

But why?

In the past, we build and release ray-ml images as a convenient way for people to run Machine Learning related Python packages in a Ray environment. In these images, put in around 200+ additional Python packages, including PyTorch, TensorFlow, JAX, XGBoost, Dask, and many many other ones that Ray can work with. Installing all these packages in one image have several drawbacks:

  • Most Machine Learning applications only use a small subset of the packages. For example, a Ray app rarely uses torch and tensorflow at the same time.
  • They increase the size of the container image by around 5GiB. For context, a typical GPU ray-ml image is around 10GiB (compressed layers): Ray and its system dependencies are no more than 1GiB, Nvidia CUDA (devel version) SDK is around 4GiB, and all the ML packages are around 5GiB.
  • As a result of the image size, the image takes longer time and becomes harder to load. When being used in a cluster, it makes the cluster take longer to launch and slower to scale up.
  • More importantly, as time goes by and new versions of these libraries get released, compiling these packages together without dependency conflicts almost becomes an impossible art.
  • And over time, most serious users will learn the limitations of ray-ml images, stop using it and build on top of ray directly.

Therefore, to make ray images load faster, and to allow ray to work better with newer versions of Python interpreter and other machine learning libraries, we will stop recommending or supporting ray-ml as an “all-in-one” solution.

What should we use then?

The release and publishing of rayproject/ray container images will remain unchanged.
https://hub.docker.com/r/rayproject/ray/tags

You can pip install Python packages on top of rayproject/ray. A simple Dockerfile example:

FROM rayproject/ray:2.31.0-py310-gpu
RUN pip install torch

Which will install the latest PyTorch on top of rayproject/ray:2.31.0-py310-gpu.

If it is to install a package that Ray supports working with, you can also use the constraint file that comes with the image, to install the exact library versions that we tested against during Ray’s release process:

FROM rayproject/ray:2.31.0-py310-gpu
RUN pip install torch -c /home/ray/requirements_compiled.txt

This will result in much smaller images that are much faster to load than using ray-ml images.


Please comment and let me know if any questions.