was wondering if I could have any guidance on the implementation of the continuous version of RNN-SAC, since the torch implementation only has the discrete one available.
I have managed to convert the code to tensorflow from torch, which runs fine with discrete but the continuous implementation (That I did on my own) did not seem to be learning well compared to the SAC-MLP. I believe I may have did something wrong.
Below is the code from my own “sacrnn_tf_policy” , where in the def get_distribution_and_class and I simply add the ‘policy_t’ after the seq length.
Everything else was the exact conversion from torch to tf.
# Get a distribution class to be used with the just calculated dist-inputs. action_dist_class = _get_dist_class(policy, policy.config, policy.action_space) action_dist_t = action_dist_class( distribution_inputs, policy.model) policy_t = action_dist_t.deterministic_sample() _, q_state_out = model.get_q_values(model_out, states_in["q"], seq_lens, policy_t) if model.twin_q_net: _, twin_q_state_out = \ model.get_twin_q_values(model_out, states_in["twin_q"], seq_lens, policy_t) else: twin_q_state_out =