Implementation of RNN-SAC for continuous action?

Hi there,

was wondering if I could have any guidance on the implementation of the continuous version of RNN-SAC, since the torch implementation only has the discrete one available.

I have managed to convert the code to tensorflow from torch, which runs fine with discrete but the continuous implementation (That I did on my own) did not seem to be learning well compared to the SAC-MLP. I believe I may have did something wrong.

Below is the code from my own “sacrnn_tf_policy” , where in the def get_distribution_and_class and I simply add the ‘policy_t’ after the seq length.
Everything else was the exact conversion from torch to tf.

# Get a distribution class to be used with the just calculated dist-inputs.
    action_dist_class = _get_dist_class(policy, policy.config,
                                        policy.action_space)
    action_dist_t = action_dist_class(
        distribution_inputs, policy.model)
    policy_t = action_dist_t.deterministic_sample()
    _, q_state_out = model.get_q_values(model_out, states_in["q"], seq_lens, policy_t)
    if model.twin_q_net:
        _, twin_q_state_out = \
            model.get_twin_q_values(model_out, states_in["twin_q"], seq_lens, policy_t)
    else:
        twin_q_state_out = []