RLlib + MLflow (+ Serve) workflow

Hi! I am curious what is the suggested/intended workflow for a full cycle of: training with Tune/RLlib → storing artifacts/models in MLflow → using the trained model in a custom script (with or without Serve). My question is mainly about what should be stored in MLflow? A PyTorch model extracted from a policy (e.g. PPO)? Or a whole trainer (PPOTrainer) as some kind of a Python function? I’d like to have an MLflow Model available from my experiment, and an option to instantiate PPOTrainer from that model.

One possible workflow would be (this is from one of our industry users):

  1. Pre-process historic data (offline RL).
  2. Train RLlib Trainer for some time.
  3. Trainer.save() or Trainer.get_policy().export_model() → MLFlow?
  4. Repeat 2) and 3) (e.g. tune hyperparam tuning)
  5. Evaluate stored models (pick a good one to continue training or serving).
  6. Trainer.restore() (we currently have no Trainer.get_policy().import_model() method).
  7. Serve model (e.g. see ray/rllib/examples/serve_and_rllib.py) via Ray Serve.

Yeah, I think only the trained Trainer should be stored in MLFlow.

By that you mean a directory with checkpoints and config files? So the Trainer.restore() can be used to extract the policy?

Maybe @amogkam or @architkulkarni would have some perspective here too?

I know less about the RLlib side of things so I’m not sure if this will be relevant to the original question, but here’s a reference on how to use Ray Serve with MLflow models: Serving Machine Learning Models — Ray v2.0.0.dev0
The following blog post about using Tune and Serve with MLflow might also be helpful: Anyscale - Ray & MLflow: Taking Distributed Machine Learning Applications to Production