[Rllib] Store actions during training with PPOTrainer to get statistics about action-distribution over episodes

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Best Way to collect stats about actions taken during training?
I am running training loops with the PPOTrainer and I would like to know how the action distribution changes over training episodes. What is the best way to get the actions taken during training?

My current training loop

rllib_trainer = PPOTrainer(config=config)

results = []
episode_data = []
num_iterations = 10

for n in range(num_iterations):
    result = rllib_trainer.train()

    # store relevant metrics from the result dict to the episode dict
    episode = {
        "n": n,
        "episode_reward_min": result["episode_reward_min"],
        "episode_reward_mean": result["episode_reward_mean"],
        "episode_reward_max": result["episode_reward_max"],
        "episode_len_mean": result["episode_len_mean"],


    # store results every iteration
    result_df = pd.DataFrame(data=episode_data)
    result_df.to_csv(result_file, index=False)

Can I somehow access the action-distribution via train() results?

1 Like

Hi @Mirakolix_Gallier ,

No, action distributions are not visible in the training results, for we don’t count them as metrics.
You can use the ModelCatalog to retrieve the appropriate action distribution function for your case.
To gain the inputs for the action distribution function you can…

  • put “output”: “” in your config and whenever you want to compute action distributions, read from there
  • after each training step, evaluate the policy a little manually on your boilerplate
  • modify the RLlib’s code to include the action_distribution_inputs in the results dict. → Have a look at PPO’s training_step() method. Exepriences past around there contain the action distribution inputs.