Save model parameters on each checkpoint

I would like to save the model (.pb, .h5) parameters on each checkpoint as we would like
to compare the various stages of training outside of the ray/rllib framework and the
models are relatively small. It is not possible to know ahead of time how many iterations
are needed for training at the moment.

I have confirmed saving at the end of training works:

from ray import tune
from ray.rllib.agents.ppo import PPOTrainer
from ray.tune.trial import ExportFormat

tune.run(PPOTrainer,
config={“env”: “CartPole-v0”},
export_formats=[ExportFormat.MODEL, ExportFormat.H5, ExportFormat.CHECKPOINT],
local_dir=‘cart_outputs3’,
stop={“training_iteration”: 1}
)

PPO_CartPole-v0_91fad_00000_0_2021-07-14_09-14-54
├── checkpoint
│ ├── checkpoint
│ ├── model.data-00000-of-00001
│ ├── model.index
│ └── model.meta
├── events.out.tfevents.1626250494.velocity
├── model
│ ├── events.out.tfevents.1626250523.velocity
│ ├── saved_model.pb
│ └── variables
│ ├── variables.data-00000-of-00002
│ ├── variables.data-00001-of-00002
│ └── variables.index
├── params.json
├── params.pkl
├── progress.csv
└── result.json

(First problem is that the .h5 file is not created despite it being an available export option)

Now we use Tune:

results = tune.run(args.run, config=config, stop=stop, checkpoint_freq=2,
export_formats=[ExportFormat.MODEL, ExportFormat.H5],
num_samples=1, checkpoint_at_end=False)

But in this case nothing else appears in the checkpoints at all.
checkpoint_000002
├── checkpoint-2
└── checkpoint-2.tune_metadata

In a previous version of Ray (I think 0.8.0) - setting
,“checkpoint_freq”:2
,“checkpoint_at_end”: True
in the config and using run_experiments would create the model data under each checkpoint directory:
run_experiments({“EnvName”: myconfig})

So how can one save the model parameters (TensorFlow in this case) - to .pb or .h5 at each
checkpoint (the model is small) using ray.tune? Many thanks!

As an additional point, using ray==1.4.1 on Mac OS X. Is saving a model down in recoverable format (pb, hdf5) at checkpoints a supported feature? Is there any other information needed? I am kind of stuck at this point. Cheers.

If there is anyone with information about this would greatly appreciate it; I have tried several forums and scoured the internet for this simple question that seems so basic and essential. Quite a few queries about it but so no response anywhere. It would seem quite silly to have all this fantastic framework for training and tuning - but be unable to actually use the model trained outside of a ray actor that forces one to use the service/ports, etc. As I mentioned saving a model on a checkpoint as an option used to work in previous versions. I have tried this on Mac OS X and Linux and get the same result - a checkpoint only has the following files:
checkpoint-1479
checkpoint-1479.tune_metadata
despite specifying H5 and model on the input. No error is produced during training:

results = tune.run(args.run, 
                    config=config, 
                    stop=stop, 
                    checkpoint_freq=1, 
                    export_formats=[ExportFormat.MODEL, 
                           ExportFormat.H5, ExportFormat.CHECKPOINT], 
                    checkpoint_at_end=True
                )

I’ve been using Ray since the initial version and never had this issue - please help.
From a design standpoint I don’t think it would make sense to use the ‘results’ from the tuning - firstly one may not know how many iterations are needed ahead of time and may need to hit CTRL^C to stop the training. If we need to use results then all of that would be lost. Instead, the usual point of the checkpoint is to save the model so that recovery can start from an arbitrary point. The other objective is for inference: we may want to compare inference along different checkpoints - but we need to re-create the network model in tensoflow without the overhead of ray actors in the way.

Hi,
Sorry for getting to you late. But could you try the latest 1.7.0 release? Also could you share a repro script? We are actually revamping Tune checkpointing logic. Would love to take a look at this issue!

Many thanks - I tried slightly modifying one of the tune examples to snapshot a model on every checkpoint, using ray 1.8.0 - there is still no model saved on each checkpoint:

##########

Contribution by the Center on Long-Term Risk:

GitHub - longtermrisk/marltoolbox: A toolbox with the goal of speeding up research on bargaining in MARL (cooperation problems in MARL).

##########
import argparse
import os

import ray
from ray import tune
from ray.rllib.agents.ppo import PPOTrainer
from ray.rllib.examples.env.coin_game_non_vectorized_env import
CoinGame, AsymCoinGame

parser = argparse.ArgumentParser()
parser.add_argument("–tf", action=“store_true”)
parser.add_argument("–stop-iters", type=int, default=2000)

def main(debug, stop_iters=2000, tf=False, asymmetric_env=False):
train_n_replicates = 1 if debug else 1
seeds = list(range(train_n_replicates))

ray.init()

env_config = {
    "players_ids": ["player_red", "player_blue"],
    "max_steps": 20,
    "grid_size": 3,
    "get_additional_info": True,
}

rllib_config = {
    "env": AsymCoinGame if asymmetric_env else CoinGame,
    "env_config": env_config,
    "multiagent": {
        "policies": {
            env_config["players_ids"][0]: (
                None, AsymCoinGame(env_config).OBSERVATION_SPACE,
                AsymCoinGame.ACTION_SPACE, {}),
            env_config["players_ids"][1]: (
                None, AsymCoinGame(env_config).OBSERVATION_SPACE,
                AsymCoinGame.ACTION_SPACE, {}),
        },
        "policy_mapping_fn": lambda agent_id, **kwargs: agent_id,
    },
    # Size of batches collected from each worker.
    "rollout_fragment_length": 20,
    # Number of timesteps collected for each SGD round.
    # This defines the size of each SGD epoch.
    "train_batch_size": 512,
    "model": {
        "dim": env_config["grid_size"],
        "conv_filters": [[16, [3, 3], 1],
                         [32, [3, 3],
                          1]]  # [Channel, [Kernel, Kernel], Stride]]
    },
    "lr": 5e-3,
    "seed": tune.grid_search(seeds),
    "num_gpus": int(os.environ.get("RLLIB_NUM_GPUS", "0")),
    "framework": "tf" if tf else "torch",
}

from ray.tune.trial import ExportFormat

stop = {
    "training_iteration": 20 if debug else stop_iters,
}

tune_analysis = tune.run(
    PPOTrainer,
    config=rllib_config,
    stop=stop,
    checkpoint_freq=1,
    export_formats=[ExportFormat.MODEL, ExportFormat.H5],
    checkpoint_at_end=True,
    name="PPO_AsymCG")

ray.shutdown()

return tune_analysis

if name == “main”:
args = parser.parse_args()
debug_mode = True
use_asymmetric_env = False
main(debug_mode, args.stop_iters, args.tf, use_asymmetric_env)

PPO_CoinGame_d2693_00000_0_seed=0_2021-11-06_10-39-30
├── checkpoint_000001
│ ├── checkpoint-1
│ └── checkpoint-1.tune_metadata
├── checkpoint_000002
│ ├── checkpoint-2
│ └── checkpoint-2.tune_metadata
├── checkpoint_000003
│ ├── checkpoint-3
│ └── checkpoint-3.tune_metadata
├── checkpoint_000004
│ ├── checkpoint-4
│ └── checkpoint-4.tune_metadata
├── checkpoint_000005
│ ├── checkpoint-5
│ └── checkpoint-5.tune_metadata
├── checkpoint_000006
│ ├── checkpoint-6
│ └── checkpoint-6.tune_metadata
├── checkpoint_000007
│ ├── checkpoint-7
│ └── checkpoint-7.tune_metadata
├── checkpoint_000008
│ ├── checkpoint-8
│ └── checkpoint-8.tune_metadata
├── checkpoint_000009
│ ├── checkpoint-9
│ └── checkpoint-9.tune_metadata
├── checkpoint_000010
│ ├── checkpoint-10
│ └── checkpoint-10.tune_metadata
├── checkpoint_000011
│ ├── checkpoint-11
│ └── checkpoint-11.tune_metadata
├── checkpoint_000012
│ ├── checkpoint-12
│ └── checkpoint-12.tune_metadata
├── checkpoint_000013
│ ├── checkpoint-13
│ └── checkpoint-13.tune_metadata
├── checkpoint_000014
│ ├── checkpoint-14
│ └── checkpoint-14.tune_metadata
├── checkpoint_000015
│ ├── checkpoint-15
│ └── checkpoint-15.tune_metadata
├── checkpoint_000016
│ ├── checkpoint-16
│ └── checkpoint-16.tune_metadata
├── checkpoint_000017
│ ├── checkpoint-17
│ └── checkpoint-17.tune_metadata
├── checkpoint_000018
│ ├── checkpoint-18
│ └── checkpoint-18.tune_metadata
├── checkpoint_000019
│ ├── checkpoint-19
│ └── checkpoint-19.tune_metadata
├── checkpoint_000020
│ ├── checkpoint-20
│ └── checkpoint-20.tune_metadata
├── events.out.tfevents.1636195170.wormwood
├── params.json
├── params.pkl
├── progress.csv
└── result.json

Hello - is this a suitable example? If not it should be possible to create a simpler one - I pulled an existing example and added the necessary parameters. Thank you!

Hi, sorry about our delay in responding to you.

I completely agree that exporting a model with every checkpoint is a really useful thing to have.
Did it really work at some point in history? Do you remember which version it was? Were you using Tune or just RLlib.
In any case, this is something we really want to clean up this Q. We should make it easy for users to take the model out.

Your repro script also revealed another potential problem for this. Since you are running multi-agent training without a policy under default_policy_id, the export_model() call errors out for us.
We should probably fix RLlib to export everything listed under multiagent.policies_to_train key.

I think it worked all the way back in 0.8.X-1.2.X - and I may be able to verify this with old setups - always with ray/tune and a custom model. I generally like the flexibility of tune to configure parameters, but with a custom model (policy) - which may be quite complex. The ray project would be well served if the documentation/examples of loading/saving models in the native (tensorflow/pytorch) formats and documenting it early on. Also, I’d be happy to any other approach if tune is not the right way to achieve this. Thanks so much!

Update: We are scoping this out to support exporting model for every checkpoint (not just at the end)… @kai

Great thank you - this would be for the next version?

Hi was this implemented?