Hi there, I’d like to set in a multienv impossible actions. I’ve read the doc (RLlib Models, Preprocessors, and Action Distributions — Ray v2.0.0.dev0) but I don’t understand the purposes of avail_actions and action_mask. Could someone explain it to me please ?
I’m in a similar situation. Disclaimer: I know very little about RL, this is just what I’ve pieced together over a few hours googling.
avail_actions seems to be there for action embeddings. If you follow links in the docs enough, you’ll get to
action_mask is what we really want. Unfortunately, this example interweaves it with action embedding.
I would imagine you could delete
self.action_assignments and its friends to get to base, mask-only functionality. You’d also need to modify
ParametricActionsModel, since it expects
avail_actions in observations and uses it to compute
intent_vector, and thus
The theory here seems simple–to mask, just intercept
forward calls and make the logits for masked/invalid actions very negative. I’m not sure why I can’t crack it. Probably a silly dimensions issue.
There’s a good blog post on this, but it only has one line on
The available actions correspond to each of the five items the agent can select for packing.
The author seems to work around
avail_actions rather than excising it; they always set it to ones. Maybe that’s the easier approach.
If any maintainers read this, I’d love to see an example with action masking and embedding separated. I’m sure it’s painfully obvious to experts how to separate them.
Thank you very much for your answer, I’ve read through the article, but unfortunately it doesn’t explain the tricky parts at all. I still have no idea what action embedding is. I manage to mask out impossible actions by using action_mask like that :
inf_mask = torch.clamp(torch.log(action_mask), FLOAT_MIN, FLOAT_MAX) return output+inf_mask, 
(it’s in an actor-critic network, output are the logits behind the policy).
But I wonder if I’m not missing something important to make everything work with avail_actions and actions embedding.
Yeah, I sympathize. I still don’t quite grok, but I did find this post a bit enlightening: https://neuro.cs.ut.ee/the-use-of-embeddings-in-openai-five/
I love your article!! It’s been a long time since I wanted to have an application of attention mechanisms to reinforcement learning. And also to have an application of reinforcement learning to a complex and variable space of observation as well as to a complex and variable space of action.