AttentionNet with action masking

Adri31 · October 28, 2021, 1:20pm

Hi all,
I am able to use AttentionNet with my own environment. It works well. I can also use fc net with action masking, but when I want to use AttentionNet with action masking, it fails…
Here is my code. I don’t know if I have to specify state and seq_lens in the forward function below.

class ParametricActionsAttentionModel(TFModelV2):
def init(self, obs_space, action_space, num_outputs, model_config, name, true_obs_shape=(10102,), **kwargs):
super(ParametricActionsAttentionModel, self).init(
obs_space, action_space, action_space.n, model_config, name)
self.seq_length = model_config[“max_seq_len”]
self.attention_dim = model_config[“attention_dim”]
self.train_batch_size = model_config[“train_batch_size”]
model_config.pop(“custom_model”)
self.model = ModelCatalog.get_model_v2(
obs_space=Box(low=-3.0, high=3.0, shape=true_obs_shape),
action_space=action_space,
num_outputs=action_space.n,
model_config=model_config,
framework=‘tf2’)

def forward(self, input_dict, state, seq_lens):
action_mask = input_dict[“obs”][“action_mask”]
action_logits, _ = self.model({"obs:input_dict[“obs”][“state”]}, state, seq_lens)
# Mask out invalid actions (use tf.float32.min for stability)
inf_mask = tf.maximum(tf.math.log(action_mask), tf.float32.min)
return action_logits + inf_mask, state

def value_function(self):
return self.model.value_function()

And the error I got:
python3.8/site-packages/ray/rllib/models/modelv2.py", line 230, in call
(pid=1300201) res = self.forward(restored, state or [], seq_lens)
(pid=1300201) File “/.virtualenvs/lib/python3.8/site-packages/ray/rllib/models/tf/attention_net.py”, line 444, in forward
(pid=1300201) wrapped_out, _ = self._wrapped_forward(input_dict, [], None)
(pid=1300201) File “/git/rl/ray/tf_models.py”, line 239, in forward
(pid=1300201) action_logits, _ = self.model(input_dict, state, seq_lens)
(pid=1300201) File “/.virtualenvs/lib/python3.8/site-packages/ray/rllib/models/modelv2.py”, line 230, in call
(pid=1300201) res = self.forward(restored, state or [], seq_lens)
(pid=1300201) File “/.virtualenvs/lib/python3.8/site-packages/ray/rllib/models/tf/attention_net.py”, line 482, in forward
(pid=1300201) self._features, memory_outs = self.gtrxl(input_dict, state, seq_lens)
(pid=1300201) File “/.virtualenvs/lib/python3.8/site-packages/ray/rllib/models/modelv2.py”, line 230, in call
(pid=1300201) res = self.forward(restored, state or [], seq_lens)
(pid=1300201) File “/.virtualenvs/lib/python3.8/site-packages/ray/rllib/models/tf/attention_net.py”, line 320, in forward
(pid=1300201) observations = tf.reshape(observations,
(pid=1300201) File “/.virtualenvs/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py”, line 206, in wrapper
(pid=1300201) return target(*args, **kwargs)
(pid=1300201) File “/.virtualenvs/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py”, line 195, in reshape
(pid=1300201) result = gen_array_ops.reshape(tensor, shape, name)
(pid=1300201) File “/.virtualenvs/lib/python3.8/site-packages/tensorflow/python/ops/gen_array_ops.py”, line 8397, in reshape
(pid=1300201) _, _, _op, _outputs = _op_def_library._apply_op_helper(
(pid=1300201) File “/.virtualenvs/lib/python3.8/site-packages/tensorflow/python/framework/op_def_library.py”, line 748, in _apply_op_helper
(pid=1300201) op = g._create_op_internal(op_type_name, inputs, dtypes=None,
(pid=1300201) File “/.virtualenvs/lib/python3.8/site-packages/tensorflow/python/framework/ops.py”, line 3557, in _create_op_internal
(pid=1300201) ret = Operation(
(pid=1300201) File “/.virtualenvs/lib/python3.8/site-packages/tensorflow/python/framework/ops.py”, line 2045, in init
(pid=1300201) self._traceback = tf_stack.extract_stack_for_node(self._c_op)

yinyee · October 29, 2021, 12:35pm

Hello! I would like to know if action masking is something that is available off the shelf with RLlib?

mannyv · October 30, 2021, 11:06am

Hi @yinyee,

Action making is supported by rllib but you have to write the model to do the action making your self.

Here is an example:

github.com

ray-project/ray/blob/master/rllib/examples/action_masking.py

import argparse
from gym.spaces import Box, Discrete
import os

from ray.rllib.examples.env.action_mask_env import ActionMaskEnv
from ray.rllib.examples.models.action_mask_model import \
    ActionMaskModel, TorchActionMaskModel

parser = argparse.ArgumentParser()
parser.add_argument(
    "--run",
    type=str,
    default="APPO",
    help="The RLlib-registered algorithm to use.")
parser.add_argument("--num-cpus", type=int, default=0)
parser.add_argument(
    "--framework",
    choices=["tf", "tf2", "tfe", "torch"],
    default="tf",
    help="The DL framework specifier.")

This file has been truncated. show original

mannyv · October 30, 2021, 11:28am

Hi @Adri31,

Is there more to the stack trace? It looks like the actual error message is missing.

If you wrap your code and error message in three `s then it will preserve the formatting and be easier to read.

There is an issue here:
action_logits, _ = self.model({"obs:input_dict[“obs”][“state”]}, state, seq_lens)"

You are not returning the update memory states. It will always use the initial state.

This is probably not the source of your error though.

Do you have a full reproduction script you could share in Google collab?

Adri31 · November 2, 2021, 4:06pm

Thank @mannyv for your help. I tried to reproduce the code with a well-known environment => Google Colab

With my environment, when “use_attention” is False, it works with action masking. When “use_attention” is False, it fails with a weird error: "Received a label of 100 which is outside the valid range [0, 100) in tf.actiondist.py:70.

In the colab code, it is very strange. I tried several environments and Ray does not take into account stop iteration = 10, the custom model and the model itself… Something wrong in my code…

Topic		Replies	Views
Issue creating custom action mask enviorment RLlib	14	2212	October 11, 2023
Masked GTrXLNet Configure Algorithm, Training, Evaluation, Scaling	0	286	December 8, 2023
Action masking & Dict observation space & 'avail_actions'? Configure Algorithm, Training, Evaluation, Scaling	1	1046	August 4, 2023
Ppo centralized critic model with action masking RLlib	2	648	February 14, 2022
Masking in custom autoregressive ActionDistribution RLlib	1	195	May 13, 2023

AttentionNet with action masking

Related topics