When running multiple RL experiments with evaluation during training, rllib reports the evaluation metrics to tensorboard as “ray/tune/evaluation/…”.
How is it possible to access those metrics programmatically for analysis after training?
I looked into tune.Analysis to easily get statistics about multiple experiments, it works great but it has everything except the evaluation data
import ray.tune as tune
EXPERIMENT_FOLDER = "/home/username/ray_results/my_experiments"
analysis = tune.Analysis(EXPERIMENT_FOLDER, default_metric="episode_reward_mean", default_mode="max")
df = analysis.dataframe()
for c in df.columns: print(c)
Thank you for the answer.
From my own investigation, I found that tune is using result.json to pull the data, and the “evaluation” key is not reported in this file.
Could this has something to do with the fact that initially the dict returned by Trainer.train() does not have the key? If evaluation is done every n iterations?