I’m moving this Slack discussion over here, so I’ll try to wrap it as clear and short as possible for future references.
What I want: A metric over evaluation episodes to save and load checkpoints based on that score.
Eg:
Train some agent to play Flappy birds,
=> Evaluate while training using a custom evironment evaluation config with unseen maps
=> use those evaluation scores as Metric.
In short: I can save Custom Metrics which I cannot use to save checkpoints (but I can see in tensorboard), and I can save values to result dictionary to load the successfully as checkpoint metric, but cannot have access to episode statistics from there.
Details: I want to add a checkpoint_score_attr
here based on evaluation episode statistics
results = tune.run("IMPALA",
verbose=1,
num_samples=1,
config=config,
stop=stop,
checkpoint_freq=25,
checkpoint_at_end=True,
sync_on_checkpoint=False,
keep_checkpoints_num=50,
checkpoint_score_attr='evaluation/episode_reward_mean',
# or using custom metrics from a custom callback class
#checkpoint_score_attr='custom_metrics/episode_reward_mean',
...)
Using this:
https://github.com/ray-project/ray/blob/master/rllib/examples/custom_metrics_and_callbacks.py
I created a similar CustomCallback class, where I can save manually statistics from evaluation episodes (basically the sum of rewards per episode), to then use them as checkpoint_score_attr
.
But custom_metrics/...
, saved in progress.csv, does not get saved to the result diccionary, causing an error as the attr used does not exists:
But if instead I save directly to the result dict like in the next image (with numeric values instead of ‘I would love to…’), it seems to loads the metric attr without errors, as it seems to add it to the result dict:
The problem with this approach is that inside on_train_result
I do not have statistics of my episodes, as in the previous methods.
In short: I can save Custom Metrics which I cannot use to save checkpoints (but I can see in tensorboard), and I (think I) can save values to result dictionary to load the successfully as checkpoint metric, but cannot have access to episode statistics from there.
I have tried several approaches suggested in slack, without any success, and now I’m a bit lost.