Change from dynamic_tf_policy to dynamic_policy_v2

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity

Can anyone from the developers explain why there are algorithms in the master that use dynamic_tf_policy while others use dynamic_tf_policy_v2? Is this just a relict and will be changed over the next weeks or are there some further challenges to implement algorithms like DQN with the dynamic_tf_policy_v2?

The reason why I am asking is that I am implementing exploration algorithms and for this I pass the self._sess attribute of the dynamic_tf_policy_v2 to the _create_exploration() method of the Policy to have access to the policy’s tf session in the exploration.

Maybe @sven1977 @avnishn @arturn @kourosh ?

ah, that’s just me not having enough time to migrate everything over.
functionality wise, these 2 are identical right now. the v2 policies are simple sub-classing based, which make it much easier to checkpoint by name in connector enabled checkpoints.
I plan to finish them at some point soon.
although, you should definitely chat with @kourosh when you are here, he is planning a bunch of pretty big actual updates to our policy and models.

1 Like