Hi, though the example below shows the usage of parametric action space for cartpole, I would like to ask that if we can use continous action space to produce the parametric action.
Instead of letting the policy network produce a latent action vector pi_t, and we obtain the discrete action dist dist = Discrete ( dot(pi_t, e) ), where e is the available action embedding, can we consider the pi_t is sampled from a continuous action dist (like gaussian), and we then dot it with all action embeddings? But this would introduce two action distribution (gaussian + discrete). And I am not sure it is correct or not.