How severe does this issue affect your experience of using Ray?
- None: Just asking a question out of curiosity
Can anyone from the developers explain why there are algorithms in the master that use dynamic_tf_policy
while others use dynamic_tf_policy_v2
? Is this just a relict and will be changed over the next weeks or are there some further challenges to implement algorithms like DQN with the dynamic_tf_policy_v2
?
The reason why I am asking is that I am implementing exploration algorithms and for this I pass the self._sess
attribute of the dynamic_tf_policy_v2
to the _create_exploration()
method of the Policy
to have access to the policy’s tf session in the exploration.