Variable-length / Parametric Action Spaces

Ethan · August 29, 2021, 9:38am

Hi,

I already have a custom model to preprocess my graph-like input.
Could I add another one for parametric Action Spaces? If I don’t use action_embedding_sz, what should I do?
For example,
the max actions are 16.
I need to choose one action from [0,1,…,x] x is varied but no more than 16.
Thanks.

sven1977 · August 31, 2021, 7:04am

Hey @Ethan, great question. One solution could be to let your model set the logits of invalid actions to e.g. FLOAT_MIN (from ray.rllib.utils.torch_ops import FLOAT_MIN). Then your Policy’s “Exploration” component (e.g. EpsilonGreedy for DQN) will automatically not pick those actions. Does this make sense or is your setup more complicated than this. Your model - in this case - would have to interpret the given observation and come up with which actions are valid and which are not (maybe you have something in your observation space that indicates this).

Topic		Replies	Views
Available actions with variable-length action embeddings RLlib	5	968	May 13, 2021
Flatten observation space (dictionary) in parametric actions RLlib	2	868	July 30, 2021
Coud we use continuous action space for parametric action spaces RLlib	0	262	May 21, 2021
Continuous action space and custom model RLlib	4	1533	July 17, 2021
[rllib] Dict Action Space and Custom Model RLlib	5	2453	March 30, 2021

Variable-length / Parametric Action Spaces

Related topics