Accessing rllib evaluation in tune.Analysis

When running multiple RL experiments with evaluation during training, rllib reports the evaluation metrics to tensorboard as “ray/tune/evaluation/…”.
How is it possible to access those metrics programmatically for analysis after training?

I looked into tune.Analysis to easily get statistics about multiple experiments, it works great but it has everything except the evaluation data :confused:

import ray.tune as tune 
EXPERIMENT_FOLDER = "/home/username/ray_results/my_experiments"
analysis = tune.Analysis(EXPERIMENT_FOLDER, default_metric="episode_reward_mean", default_mode="max")
df = analysis.dataframe()
for c in df.columns: print(c)

This will print:

episode_reward_max
episode_reward_min
episode_reward_mean
episode_len_mean
episodes_this_iter
... 
custom_metrics/... 
...
config/... 

I am looking for a similar solution to get evaluation data. Thanks

Using Ray version 1.2.0.

@MaximeBouton actually, I’m not sure. Could you ask this under Ray Tune?

The evaluation data gets returned under the “evaluation” top-level key within the metrics dict that an RLlib Trainer.train() returns.

@kai @amogkam @rliaw ?

Thank you for the answer.
From my own investigation, I found that tune is using result.json to pull the data, and the “evaluation” key is not reported in this file.

(I changed the topic to ray tune)

Could this has something to do with the fact that initially the dict returned by Trainer.train() does not have the key? If evaluation is done every n iterations?

I would also be very interested in a solution to this. Does the problem still exist?

@MaximeBouton @LukasNothhelfer

Take a look at this message and the one below it for an explanation on the issue and how to fix it.