Change from dynamic_tf_policy to dynamic_policy_v2

Lars_Simon_Zehnder · July 24, 2022, 7:19am

How severe does this issue affect your experience of using Ray?

None: Just asking a question out of curiosity

Can anyone from the developers explain why there are algorithms in the master that use dynamic_tf_policy while others use dynamic_tf_policy_v2? Is this just a relict and will be changed over the next weeks or are there some further challenges to implement algorithms like DQN with the dynamic_tf_policy_v2?

The reason why I am asking is that I am implementing exploration algorithms and for this I pass the self._sess attribute of the dynamic_tf_policy_v2 to the _create_exploration() method of the Policy to have access to the policy’s tf session in the exploration.

Maybe @sven1977 @avnishn @arturn @kourosh ?

gjoliver · August 5, 2022, 7:44pm

ah, that’s just me not having enough time to migrate everything over.
functionality wise, these 2 are identical right now. the v2 policies are simple sub-classing based, which make it much easier to checkpoint by name in connector enabled checkpoints.
I plan to finish them at some point soon.
although, you should definitely chat with @kourosh when you are here, he is planning a bunch of pretty big actual updates to our policy and models.

Topic		Replies	Views
Can not save policies in checkpointing Checkpointing, Restoring	1	439	March 16, 2023
Score the trained policy by ray RLlib	2	238	June 25, 2021
Saving model / policies / weights after PPO training with a custom TFModelV2 Checkpointing, Restoring	3	307	March 7, 2024
Proper way to implement a custom Algorithm + Policy + Model RLlib	2	308	April 24, 2023
Handling multiple rewards to different branches of model RLlib	3	302	September 15, 2021

Change from dynamic_tf_policy to dynamic_policy_v2

Related Topics