How to get DQN action distribution

Hi,

I noticed that if I run policy.model.from_batch(obs) for DQN, the output is not the size of the action space but rather the size of the internal feature representation of the network, can you help me understand how to get the action distribution? Is it derived from the Q function? Apologies if this question is naive.

Thanks,
Sam

Use the compute_actions / compute_single_action methods.

How to get the distribution of actions from compute_action or compute_actions? The documentation says they return only the best action.

This would be really important, since if your network returns an action that is impossible, how would you know the next best action to try instead?