I’m running DQN on a
Tuple([Box(shape=(1,)), Discrete(n=2)]) action space. If I bucketise the continuous action as
Discrete(n=11) (say) and I make the action space the product of both discrete spaces, it becomes
Discrete(n=22). In that case, RLlib uses the
Categorical TF action distribution and it runs.
I would now like to override the
Categorical's sampling operator and basically use the
(batch_size, 22) tensor to output the continuous action and the discrete action. I subclassed
Categorical and used a custom action distribution that does just that (note that the policy object also needs to be subclassed so that in
get_distribution_inputs_and_class we use the right distribution).
In particular, when I print the sampled tensor in my custom
deterministic_sample functions, I get the expected (and desired) shape
(batch_size, 2). However, the environment step function still receives an integer between 0 and 21, meaning that the custom sampling is not used.
I think it is also worth mentioning that since RLlib first creates a fake action based on the action space to get things started, I change the received action
Is there something I’m missing, e.g something in the rollout worker or the sample batch?
Thanks a lot!