I’d like to create a setup in which I can store best checkpoints.
It would then serve in two ways:
- There would be a centralized storage for best checkpoints, with a possibility to specify which model and version use for other trainings (MLflow Model Registry — MLflow 1.30.0 documentation)
- There would be a possibility to track models on an internal web-page (MLflow Model Registry — MLflow 2.1.1 documentation)
From my point of view it could be solved by MLflow integration with rllib checkpointing and that’s why my main question is about integrating these two frameworks.
However, if there is already some tool that would fit into my requirements then I can switch to it.