Hi there,
was wondering if I could have any guidance on the implementation of the continuous version of RNN-SAC, since the torch implementation only has the discrete one available.
I have managed to convert the code to tensorflow from torch, which runs fine with discrete but the continuous implementation (That I did on my own) did not seem to be learning well compared to the SAC-MLP. I believe I may have did something wrong.
Below is the code from my own “sacrnn_tf_policy” , where in the def get_distribution_and_class and I simply add the ‘policy_t’ after the seq length.
Everything else was the exact conversion from torch to tf.
# Get a distribution class to be used with the just calculated dist-inputs.
action_dist_class = _get_dist_class(policy, policy.config,
policy.action_space)
action_dist_t = action_dist_class(
distribution_inputs, policy.model)
policy_t = action_dist_t.deterministic_sample()
_, q_state_out = model.get_q_values(model_out, states_in["q"], seq_lens, policy_t)
if model.twin_q_net:
_, twin_q_state_out = \
model.get_twin_q_values(model_out, states_in["twin_q"], seq_lens, policy_t)
else:
twin_q_state_out = []