Masking Invalid Actions for DQN Algorithm

Archana_R · March 22, 2023, 2:27pm

I am not able to successfully mask all the invalid actions using DQN Algorithm.
Please find the custom model created

class ActionMaskModel(DistributionalQTFModel):

def __init__(self, obs_space, action_space, num_outputs,
             model_config, name, true_obs_shape=(870,), 
             action_embed_size=144,**kw): 
    super(ActionMaskModel, self).__init__(obs_space, action_space, num_outputs, model_config, name, **kw)
    print("action_space :", action_space)
    print("model_config :", model_config)
    self.action_embed_model = FullyConnectedNetwork(
        Box(-1, 1, shape=true_obs_shape),
        action_space,
        action_embed_size,
        model_config,
        name + "_action_embed",
    )

def forward(self, input_dict, state, seq_lens):

    # Extract the available actions tensor from the observation.
    avail_actions = input_dict["obs"]["avail_actions"]
    action_mask = input_dict["obs"]["action_mask"]

    # Compute the predicted action embedding
    action_embed, _ = self.action_embed_model({"obs": input_dict["obs"]["state"]})

    # Expand the model output to [BATCH, 1, EMBED_SIZE]. Note that the
    # avail actions tensor is of shape [BATCH, MAX_ACTIONS, EMBED_SIZE].
    intent_vector = tf.expand_dims(action_embed, 2)

    # Batch dot product => shape of logits is [BATCH, MAX_ACTIONS].
    action_logits = tf.reduce_sum(avail_actions * intent_vector, axis=1)

    # Mask out invalid actions (use tf.float32.min for stability)
    inf_mask = tf.maximum(tf.math.log(action_mask), tf.float32.min)
    return action_logits + inf_mask, state

Am i missing anything here ? I am even inheriting the DistributionalQTFModel

mannyv · March 22, 2023, 6:59pm

Hi @Archana_R,

Action making will not work like that for DQN. Those values you are masking are intermediate activations they are not the policy logits for the actions. The type of making you are doing only works for the on policy models like PPO or A2C.

Archana_R · March 22, 2023, 7:14pm

How do you suggest I do it for DQN ?

Topic		Replies	Views
Applying action mask for DQNTrainer with 'hiddens' a non-empty list doesn't work RLlib	1	299	October 26, 2023
Apply preprocessor in custom model RLlib	19	2480	May 13, 2024
Problem with action masking RLlib	7	2287	May 19, 2022
Action masking error RLlib	9	1731	February 6, 2023
[Contribution] [Help needed] Implementing easy action masking for distributional and dueling DQN RLlib	2	505	February 23, 2023

Masking Invalid Actions for DQN Algorithm

Related topics