[RLlib] Impossible actions

Jogima-cyber · February 15, 2021, 3:35pm

Hi there, I’d like to set in a multienv impossible actions. I’ve read the doc (https://docs.ray.io/en/master/rllib-models.html#variable-length-parametric-action-spaces) but I don’t understand the purposes of avail_actions and action_mask. Could someone explain it to me please ?

jahabrewer · February 21, 2021, 8:08pm

I’m in a similar situation. Disclaimer: I know very little about RL, this is just what I’ve pieced together over a few hours googling.

avail_actions seems to be there for action embeddings. If you follow links in the docs enough, you’ll get to ParametricActionsCartPole. action_mask is what we really want. Unfortunately, this example interweaves it with action embedding.

I would imagine you could delete self.action_assignments and its friends to get to base, mask-only functionality. You’d also need to modify ParametricActionsModel, since it expects avail_actions in observations and uses it to compute intent_vector, and thus action_logits.

The theory here seems simple–to mask, just intercept forward calls and make the logits for masked/invalid actions very negative. I’m not sure why I can’t crack it. Probably a silly dimensions issue.

There’s a good blog post on this, but it only has one line on avail_actions:

The available actions correspond to each of the five items the agent can select for packing.

The author seems to work around avail_actions rather than excising it; they always set it to ones. Maybe that’s the easier approach.

If any maintainers read this, I’d love to see an example with action masking and embedding separated. I’m sure it’s painfully obvious to experts how to separate them.

Jogima-cyber · February 21, 2021, 8:53pm

Thank you very much for your answer, I’ve read through the article, but unfortunately it doesn’t explain the tricky parts at all. I still have no idea what action embedding is. I manage to mask out impossible actions by using action_mask like that :

    inf_mask = torch.clamp(torch.log(action_mask), FLOAT_MIN, FLOAT_MAX)
    return output+inf_mask, []

(it’s in an actor-critic network, output are the logits behind the policy).
But I wonder if I’m not missing something important to make everything work with avail_actions and actions embedding.

jahabrewer · February 22, 2021, 5:08pm

Yeah, I sympathize. I still don’t quite grok, but I did find this post a bit enlightening: https://neuro.cs.ut.ee/the-use-of-embeddings-in-openai-five/

Jogima-cyber · February 26, 2021, 8:58am

I love your article!! It’s been a long time since I wanted to have an application of attention mechanisms to reinforcement learning. And also to have an application of reinforcement learning to a complex and variable space of observation as well as to a complex and variable space of action.

klausk55 · April 20, 2021, 1:34pm

This “available actions thing” seems to be helpful when you have to handle huge action spaces and/or a varying number of available actions during steps.
A small example how I interpret it:
all_actions = {0, 1, 2, 3, 4, 5}, n=6 total number of actions, action_embedding_size=2
=> action embedding matrix E is 6x2
m=3 < n available action at a specific timestep, e.g. avail_actions=(0, 2, 4)
=> action embedding for avail_actions is a matrix E* composed of 1st, 3rd and 5th row of E
Finally, you calculate the dot product of the intent vector from the NN (with action_embedding_size) and E*

Code example showing the embeddings:

import numpy
import tensorflow as tf
model = tf.keras.Sequential()
embed = tf.keras.layers.Embedding(6, 2)
model.add(embed)
model.summary()
print(embed.get_weights())
input = numpy.asarray_chkfinite([0, 2, 4])
model.compile("rmsprop", "mse")
out = model.predict(input)
for i in range(out.shape[0]):
    print("{}     {}".format(input[i], out[i]))

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (None, None, 2)           12
=================================================================
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________
[array([[-0.04093417, -0.02362244],
       [-0.01528452, -0.02044444],
       [ 0.04733466,  0.01246139],
       [ 0.01975517,  0.02948004],
       [ 0.03812562,  0.0137356 ],
       [ 0.04121368, -0.01421856]], dtype=float32)]
2021-04-20 15:25:32.082275: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
0     [[-0.04093417 -0.02362244]]
2     [[0.04733466 0.01246139]]
4     [[0.03812562 0.0137356 ]]

Bam4d · April 20, 2021, 3:02pm

I authored a paper that has heavy use of invalid action making in complex action spaces.

All the examples are using Griddly and RLLib.
Paper: [2104.07294] Generalising Discrete Action Spaces with Conditional Action Trees
RLLib Code: GitHub - Bam4d/conditional-action-trees: Example Code for the Conditional Action Trees Paper

Might shed some light on action masking and why its required and how you can apply it.

vlainic · March 28, 2022, 6:26pm

I totally found my problem in this statement, as I do not see the purpose of self.action_embed_model in PA model if one only wants to mask. But I am not managing to make a mock-up code run w/o the self.action_embed_model.

Did you make it at the end @jahabrewer ?

vlainic · March 29, 2022, 7:27pm

Okay, I am blind… I somehow skipped with my eyes over the first file in the repo folders:

github.com

ray-project/ray/blob/master/rllib/examples/models/action_mask_model.py

from gym.spaces import Dict

from ray.rllib.models.tf.fcnet import FullyConnectedNetwork
from ray.rllib.models.tf.tf_modelv2 import TFModelV2
from ray.rllib.models.torch.torch_modelv2 import TorchModelV2
from ray.rllib.models.torch.fcnet import FullyConnectedNetwork as TorchFC
from ray.rllib.utils.framework import try_import_tf, try_import_torch
from ray.rllib.utils.torch_utils import FLOAT_MIN

tf1, tf, tfv = try_import_tf()
torch, nn = try_import_torch()


class ActionMaskModel(TFModelV2):
    """Model that handles simple discrete action masking.

    This assumes the outputs are logits for a single Categorical action dist.
    Getting this to work with a more complex output (e.g., if the action space
    is a tuple of several distributions) is also possible but left as an
    exercise to the reader.

This file has been truncated. show original

James_Liu · March 30, 2022, 7:28pm

Thanks for the example. I am wondering if there is any way of action masking for continuous action spaces?

sirjay · April 6, 2022, 8:18pm

Who knows what’s the difference between class ParametricActionsModel(DistributionalQTFModel) and class ActionMaskModel(TFModelV2)? I guess both performs masking invalid actions.

github.com

ray-project/ray/blob/7f1bacc7dc9caf6d0ec042e39499bbf1d9a7d065/rllib/examples/models/parametric_actions_model.py

from gym.spaces import Box

from ray.rllib.agents.dqn.distributional_q_tf_model import DistributionalQTFModel
from ray.rllib.agents.dqn.dqn_torch_model import DQNTorchModel
from ray.rllib.models.tf.fcnet import FullyConnectedNetwork
from ray.rllib.models.torch.fcnet import FullyConnectedNetwork as TorchFC
from ray.rllib.utils.framework import try_import_tf, try_import_torch
from ray.rllib.utils.torch_utils import FLOAT_MIN, FLOAT_MAX

tf1, tf, tfv = try_import_tf()
torch, nn = try_import_torch()


class ParametricActionsModel(DistributionalQTFModel):
    """Parametric action model that handles the dot product and masking.

    This assumes the outputs are logits for a single Categorical action dist.
    Getting this to work with a more complex output (e.g., if the action space
    is a tuple of several distributions) is also possible but left as an
    exercise to the reader.

This file has been truncated. show original

github.com

ray-project/ray/blob/7f1bacc7dc9caf6d0ec042e39499bbf1d9a7d065/rllib/examples/models/action_mask_model.py

from gym.spaces import Dict

from ray.rllib.models.tf.fcnet import FullyConnectedNetwork
from ray.rllib.models.tf.tf_modelv2 import TFModelV2
from ray.rllib.models.torch.torch_modelv2 import TorchModelV2
from ray.rllib.models.torch.fcnet import FullyConnectedNetwork as TorchFC
from ray.rllib.utils.framework import try_import_tf, try_import_torch
from ray.rllib.utils.torch_utils import FLOAT_MIN

tf1, tf, tfv = try_import_tf()
torch, nn = try_import_torch()


class ActionMaskModel(TFModelV2):
    """Model that handles simple discrete action masking.

    This assumes the outputs are logits for a single Categorical action dist.
    Getting this to work with a more complex output (e.g., if the action space
    is a tuple of several distributions) is also possible but left as an
    exercise to the reader.

This file has been truncated. show original

vlainic · May 11, 2022, 12:16pm

In theory, one would use Parametric Actions with embeddings for continuous action space. However, I am not sure that RLlib has this implemented as only Discrete Actions column in algo list has +parametric

vlainic · May 11, 2022, 12:19pm

ActionMaskModel is a simpler version of ParametricActionsModel:

The first one just masks some discrete actions following the environmental rules
The second one is more general and can be used on huge discrete action spaces with embeddings or even continuous action space (in theory, not sure does RLlib supports it)

Topic		Replies	Views
Problem with action masking RLlib	7	2223	May 19, 2022
Issue creating custom action mask enviorment RLlib	14	2221	October 11, 2023
What is the difference between action mask and action available? RLlib	2	657	May 3, 2022
Example for action masking (without action embeddings) for tuple action space RLlib	2	682	October 27, 2021
Action masking error RLlib	9	1695	February 6, 2023

[RLlib] Impossible actions

Related topics