How tu solve env with very large action space

I would like to solve env with relatively large action space, the action is a position in 2D np.array with size (1000, 200) so it makes 200000 possible actions (outputs of neural net).
I think if I can define somehow action as number of column and number of row it makes only 1200 values as output of neural network, is possible to do it?
I analyzed example but I.m not sure how to do it ? Is it connected somehow with list of action embeddings ?
I understand how to interpret idea of observation encoding but idea of action encoding and the usage is uncelar for me.
So is there any example how to define and use actions with two integers as output (number of column and number of row)?

Hi @Peter_Pirog ,

If you want your actions to be selected from the same space, i.e. 0…1200, this won’t make your space smaller if you select two actions on each timestep.
Here are two ideas:

  1. Sometimes large discrete actions spaces are not “as discrete” as they sound. Meaning: If adjacent actions are very similar and not completely different decisions, you can try modelling it as a continuous space and simply round your action outputs to discrete numbers.
  2. You can still try this with one of the fancy algorithms. Plain DQN will likely not help you here but you can still give rainbow a shot here.


1 Like

Any action space that is multi discrete relates to what you are describing here. The two_step_example_game features this.

@arturn Thank You for the answer I will check it.

With only 50 discrete actions rllib was slower 20 to 50 times (on Windows 11) compared to Box action, so I ended making the actions as Box and rounding the numbers.

Sorry, it was MultiDiscrete (50 actions each with values from 1 to 20)

Hi @evo11x,

Can you elaborate on this?
Maybe provide a script and the exact change you made that lead you to a slowdown of 20x-50x?


Just add an action space like this to any environment and you should see the slowdown

self.action_space = spaces.MultiDiscrete(np.full((50), 50))

compared to this
self.action_space = spaces.Box(low=-1, high=1, shape=(50,), dtype=np.float32)

The slowdown was huge on Windows 11 and on Ubuntu it was slightly faster, but not by much.