[Rllib] Store actions during training with PPOTrainer to get statistics about action-distribution over episodes

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Best Way to collect stats about actions taken during training?
I am running training loops with the PPOTrainer and I would like to know how the action distribution changes over training episodes. What is the best way to get the actions taken during training?

My current training loop

rllib_trainer = PPOTrainer(config=config)

results = []
episode_data = []
num_iterations = 10

for n in range(num_iterations):
    result = rllib_trainer.train()
    results.append(result)

    # store relevant metrics from the result dict to the episode dict
    episode = {
        "n": n,
        "episode_reward_min": result["episode_reward_min"],
        "episode_reward_mean": result["episode_reward_mean"],
        "episode_reward_max": result["episode_reward_max"],
        "episode_len_mean": result["episode_len_mean"],
    }

    episode_data.append(episode)

    # store results every iteration
    result_df = pd.DataFrame(data=episode_data)
    result_df.to_csv(result_file, index=False)


Can I somehow access the action-distribution via train() results?