[Rllib] Store actions during training with PPOTrainer to get statistics about action-distribution over episodes

Mirakolix_Gallier · August 6, 2022, 12:47pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Best Way to collect stats about actions taken during training?
I am running training loops with the PPOTrainer and I would like to know how the action distribution changes over training episodes. What is the best way to get the actions taken during training?

My current training loop

rllib_trainer = PPOTrainer(config=config)

results = []
episode_data = []
num_iterations = 10

for n in range(num_iterations):
    result = rllib_trainer.train()
    results.append(result)

    # store relevant metrics from the result dict to the episode dict
    episode = {
        "n": n,
        "episode_reward_min": result["episode_reward_min"],
        "episode_reward_mean": result["episode_reward_mean"],
        "episode_reward_max": result["episode_reward_max"],
        "episode_len_mean": result["episode_len_mean"],
    }

    episode_data.append(episode)

    # store results every iteration
    result_df = pd.DataFrame(data=episode_data)
    result_df.to_csv(result_file, index=False)

Can I somehow access the action-distribution via train() results?

arturn · October 21, 2022, 9:06pm

Hi @Mirakolix_Gallier ,

No, action distributions are not visible in the training results, for we don’t count them as metrics.
You can use the ModelCatalog to retrieve the appropriate action distribution function for your case.
To gain the inputs for the action distribution function you can…

put “output”: “” in your config and whenever you want to compute action distributions, read from there
after each training step, evaluate the policy a little manually on your boilerplate
modify the RLlib’s code to include the action_distribution_inputs in the results dict. → Have a look at PPO’s training_step() method. Exepriences past around there contain the action distribution inputs.

Topic		Replies	Views
Can ray allow access to individual episodes? RLlib	5	460	September 22, 2021
Fetch action probability distribution from trained policy RLlib	7	698	March 18, 2023
Output of PPO with discrete actions RLlib	4	1153	December 15, 2022
Extracting and storing per step agent state from RLlib rollouts RLlib	3	330	July 23, 2021
Logging discrete action distribution during training and logging text RLlib	2	400	June 21, 2023

[Rllib] Store actions during training with PPOTrainer to get statistics about action-distribution over episodes

Related topics