I noticed that if I run policy.model.from_batch(obs) for DQN, the output is not the size of the action space but rather the size of the internal feature representation of the network, can you help me understand how to get the action distribution? Is it derived from the Q function? Apologies if this question is naive.