How can I get evaluation metrics from ExperimentAnalysis

James_Liu · February 4, 2022, 8:46pm

Hi,
I can get episode_reward_[min/mean/max] from ExperimentAnalysis.trial_dataframes. However I cannot get evaluation/episode_reward_[min/mean/max] from ExperimentAnalysis.trial_dataframes when I enable evaluation. I do see the evaluation/episode_reward_[min/mean/max] in tensorboard. May I know if I should do extra configuration to make it work?
Thanks,
James

Below is what I get from ExperimentAnalysis.trial_dataframes. Unfortunately, there is no evaluation metrics in it.

Index(['episode_reward_max', 'episode_reward_min', 'episode_reward_mean',
       'episode_len_mean', 'episodes_this_iter', 'num_healthy_workers',
       'timesteps_total', 'timesteps_this_iter', 'agent_timesteps_total',
       'done', 'episodes_total', 'training_iteration', 'trial_id',
       'experiment_id', 'date', 'timestamp', 'time_this_iter_s',
       'time_total_s', 'pid', 'hostname', 'node_ip', 'time_since_restore',
       'timesteps_since_restore', 'iterations_since_restore',
       'hist_stats/episode_reward', 'hist_stats/episode_lengths',
       'sampler_perf/mean_raw_obs_processing_ms',
       'sampler_perf/mean_inference_ms',
       'sampler_perf/mean_action_processing_ms',
       'sampler_perf/mean_env_wait_ms', 'sampler_perf/mean_env_render_ms',
       'timers/sample_time_ms', 'timers/sample_throughput',
       'timers/load_time_ms', 'timers/load_throughput', 'timers/learn_time_ms',
       'timers/learn_throughput', 'info/num_steps_sampled',
       'info/num_agent_steps_sampled', 'info/num_steps_trained',
       'info/num_steps_trained_this_iter', 'info/num_agent_steps_trained',
       'perf/cpu_util_percent', 'perf/ram_util_percent',
       'perf/gpu_util_percent0', 'perf/vram_util_percent0',
       'info/learner/default_policy/learner_stats/cumulative_regret',
       'info/learner/default_policy/learner_stats/update_latency', 'trial'],
      dtype='object')

rliaw · February 8, 2022, 9:19am

@gjoliver @sven1977 any ideas here?

gjoliver · February 8, 2022, 4:37pm

right, evaluation results are only computed once every n steps. so depending on when your Tune session stops, that copy of result may not have evaluation result dict.

if you simply want the latest available eval results whenever you stop, you can set the bit always_attach_evaluation_results=True. It is a feature we introduced recently to buffer and attach the latest copy of eval results onto every single result dictionary before they are passed up to Tune.

We will update our documentation with this trick. And please use the latest release of RLlib to make sure you get the feature.

github.com

ray-project/ray/blob/c17a44cdfab6009c0d4f4a10095563e074659319/rllib/agents/trainer.py#L380

      
        
            # Customize the evaluation method. This must be a function of signature
            # (trainer: Trainer, eval_workers: WorkerSet) -> metrics: dict. See the
            # Trainer.evaluate() method to see the default implementation.
            # The Trainer guarantees all eval workers have the latest policy state
            # before this function is called.
            "custom_eval_function": None,
            # Make sure the latest available evaluation results are always attached to
            # a step result dict.
            # This may be useful if Tune or some other meta controller needs access
            # to evaluation metrics all the time.
            "always_attach_evaluation_results": False,
            # Store raw custom metrics without calculating max, min, mean
            "keep_per_episode_custom_metrics": False,
            
            
# === Advanced Rollout Settings ===
            # Use a background thread for sampling (slightly off-policy, usually not
            # advisable to turn on unless your env specifically requires it).
            "sample_async": False,
            
            
# The SampleCollector class to be used to collect and retrieve
            # environment-, model-, and sampler data. Override the SampleCollector base

Topic		Replies	Views
Accessing rllib evaluation in tune.Analysis Ray Tune	5	903	June 17, 2021
How to obtain single episode reward? RLlib	6	1002	March 19, 2024
Logging custom metrics by trial during PBT training RLlib	1	187	July 1, 2021
ExperimentAnalysis - Did you pass the correct `metric` parameter? Ray Tune	6	428	March 10, 2023
Error generating results from ExperimentAnalysis Object Ray Tune	3	292	September 23, 2021

How can I get evaluation metrics from ExperimentAnalysis

Related Topics