I’d like to create a setup in which I can store best checkpoints.
It would then serve in two ways:
- There would be a centralized storage for best checkpoints, with a possibility to specify which model and version use for other trainings (MLflow Model Registry — MLflow 1.30.0 documentation)
- There would be a possibility to track models on an internal web-page (MLflow Model Registry — MLflow 2.2.2 documentation)
From my point of view it could be solved by MLflow integration with rllib checkpointing and that’s why my main question is about integrating these two frameworks.
However, if there is already some tool that would fit into my requirements then I can switch to it.
Hi @mlokos ,
We don’t offer this integration.
By default, checkpoints are written to disk.
But you can use
ray.tune to specify network storage for checkpoints.
Have a look at this!
Tune/RLlib don’t manage models and checkpoints as extensive out of the box.
For example, there is no website like you mentioned.
Ultimately, this question is not specific to RLlib but also to other models that come out of the Ray universe.
Maybe @Yard1 has more to say about this.
yes, Tune doesn’t provide these natively but it does have pretty good integration with mlflow. You can take a look here.
I had done a workaround in which I add MLmodel file to the checkpoint directory. With that trick I can connect best checkpoints (stored as artifacts) with MLflow::ModelRegistry.
Although it would be great to have this functionality out-of-the-box (with additional parameters to compare similar environments), for now it is enough.
Nevertheless, thanks for responses ^^