Variable-length / Parametric Action Spaces

Hi,

I already have a custom model to preprocess my graph-like input.
Could I add another one for parametric Action Spaces? If I don’t use action_embedding_sz, what should I do?
For example,
the max actions are 16.
I need to choose one action from [0,1,…,x] x is varied but no more than 16.
Thanks.

1 Like

Hey @Ethan, great question. One solution could be to let your model set the logits of invalid actions to e.g. FLOAT_MIN (from ray.rllib.utils.torch_ops import FLOAT_MIN). Then your Policy’s “Exploration” component (e.g. EpsilonGreedy for DQN) will automatically not pick those actions. Does this make sense or is your setup more complicated than this. Your model - in this case - would have to interpret the given observation and come up with which actions are valid and which are not (maybe you have something in your observation space that indicates this).

1 Like