RLlib integration with MLflow model registry

mlokos · February 24, 2023, 8:21am

I’d like to create a setup in which I can store best checkpoints.
It would then serve in two ways:

There would be a centralized storage for best checkpoints, with a possibility to specify which model and version use for other trainings (MLflow Model Registry — MLflow 1.30.0 documentation)
There would be a possibility to track models on an internal web-page (MLflow Model Registry — MLflow 2.2.2 documentation)

From my point of view it could be solved by MLflow integration with rllib checkpointing and that’s why my main question is about integrating these two frameworks.

However, if there is already some tool that would fit into my requirements then I can switch to it.

arturn · April 13, 2023, 10:47pm

Hi @mlokos ,

We don’t offer this integration.
By default, checkpoints are written to disk.
But you can use ray.tune to specify network storage for checkpoints.
Have a look at this!

Tune/RLlib don’t manage models and checkpoints as extensive out of the box.
For example, there is no website like you mentioned.

Ultimately, this question is not specific to RLlib but also to other models that come out of the Ray universe.
Maybe @Yard1 has more to say about this.

xwjiang2010 · April 14, 2023, 12:27am

yes, Tune doesn’t provide these natively but it does have pretty good integration with mlflow. You can take a look here.

mlokos · April 21, 2023, 2:05pm

I had done a workaround in which I add MLmodel file to the checkpoint directory. With that trick I can connect best checkpoints (stored as artifacts) with MLflow::ModelRegistry.
Although it would be great to have this functionality out-of-the-box (with additional parameters to compare similar environments), for now it is enough.

Nevertheless, thanks for responses ^^

Topic		Replies	Views
RLlib + MLflow (+ Serve) workflow RLlib	4	1116	April 9, 2021
Some questions about checkpoint in RLLib RLlib	1	323	May 23, 2023
Ray tune Integration with Weights and Biases	5	405	March 9, 2023
Tuning a Keras model - no checkpoints saved Ray Tune	7	1512	March 1, 2023
How to save the best checkpoint of the training using RLLIB RLlib	1	1011	March 23, 2022

RLlib integration with MLflow model registry

Related topics