[rllib] wrong action dimensions when using dictionary action space

I opened an issue few days ago which has not been responded, maybe someone could help here.
The issue is that attention_net with PPO policy and dictionary action space does not seems to work.

thanks,
idan

Hi @homriidan,

The issue is coming from here:

For some reason with a Dictionary space it is not expanding for the view requirement. So you can see here that prev_actions will only contain the size of 1 action (5) and not include the view requirement (15 * 5). If you look at the first conditional in the else clause a few lines below you can see how it is including the view_requirement info to expand the size.

In the call to forward, if you look in the input dictionary to check the shapes you can confirm this:

input_dict["prev_rewards"].shape
torch.Size([32, 15])
input_dict["prev_actions"].shape
torch.Size([32, 5]) #<- this should be [32,75]

I am not sure if this was done intentionally and dictionary spaces are not valid spaces for view requirements that shift the size or if it was a bug.

@sven1977 will know more.

Responded on the github issue. Prepping a fix-it PR :slight_smile:

@mannyv @homriidan ^^

1 Like