Using different get_exploration_action logic pre and post training

Hey team ,

I have created a custom exploration class for problem I am trying to solve. I want to use two different get_exploration_action methods for following two parts of my code:

  • PPO training
  • compute_action during simulation of episodes from learned policy post training.

Is there a way to

  • modify config[“exploration_config”][“type”] after training, or
  • add a custom argument for call to get_exploration_action?

If neither, Can you please suggest a way to implement such a set up?

cc:
@sven1977 , @RickLan , @mannyv , @arturn , @RickDW , @rusu24edward , @gjoliver

@Saurabh_Arora,

I have not tried it with 2.x but I would think you could just instantiate a new algorithm policy with the updated config that changes the exploration type. That would not work for one of the exploration types that trains a model l
(Random Encoder and Curiosity) because the checkpoint won’t be able to match up weights and would error out but most explorations don’t do that.

1 Like