Using different get_exploration_action logic pre and post training

Saurabh_Arora · November 11, 2022, 7:37pm

Hey team ,

I have created a custom exploration class for problem I am trying to solve. I want to use two different get_exploration_action methods for following two parts of my code:

PPO training
compute_action during simulation of episodes from learned policy post training.

Is there a way to

modify config[“exploration_config”][“type”] after training, or
add a custom argument for call to get_exploration_action?

If neither, Can you please suggest a way to implement such a set up?

cc:
@sven1977 , @RickLan , @mannyv , @arturn , @RickDW , @rusu24edward , @gjoliver

mannyv · November 11, 2022, 10:18pm

@Saurabh_Arora,

I have not tried it with 2.x but I would think you could just instantiate a new algorithm policy with the updated config that changes the exploration type. That would not work for one of the exploration types that trains a model l
(Random Encoder and Curiosity) because the checkpoint won’t be able to match up weights and would error out but most explorations don’t do that.

Topic		Replies	Views
Custom Exploration Behavior based on Observations RLlib	0	421	March 13, 2021
[rllib] Retrieve and modify the computed discrete action logits to PPO agent RLlib	6	703	May 5, 2021
Adding custom exploration RLlib	6	692	March 4, 2022
Controlling compute_actions during training RLlib	0	376	November 26, 2021
How to compute actions with RLlib and Tune after training RLlib	3	469	September 21, 2024

Using different get_exploration_action logic pre and post training

Related topics