Does ReLU activation interfere with DQN?

How severe does this issue affect your experience of using Ray? Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I have an environment where all rewards are negative and I’m using the rllib DQN implementation. In the model config, I’m setting 'fcnet_activation' = 'relu'. What I would expect is that since all rewards are negative, the Q values would not be able to converge since the ReLU activation function does not allow for negative outputs.

However, the model seems to work just fine.

To investigate into this more, I tried adding 'post_fcnet_hiddens': [256] and 'post_fcnet_activation': tune.grid_search(None, 'relu'). Again I would expect the algorithm to work better with None rather than with relu, but the two seem to work equally good.

Is RLlib automatically modifying the activation of the last layer when DQN is used? Or both 'fcnet_activation' and 'post_fcnet_activation' are not applied to the final layer outputs from which DQN gets the Q values?

Hi @fedetask ,

as far as I can only guess here, you use the default DQN - not touching most of its configuration parameters especially the ones it inherits silently from the MODEL_DEFAULTS. The DQN agent build its networks (Q- and Target-network) by calling for each ModelCatalog.get_model_v2() (with num_outputs - becomes important shortly).
Calling this function activates RLlib’s semi-automatic model choice dependent on input and output shape (obs and num_outputs). If you look then into the different default model classes you will see something like this:

if num_outputs:
      logits_out = tf.keras.layers.Dense(
          num_outputs,
          name="fc_out",
          activation=None,
          kernel_initializer=normc_initializer(0.01),
      )(last_layer)

i.e. the last layer is always None-activated. Now, I do not know how your observation space looks like, but you see will this in many different default networks (Fully Connected Network, CNN, etc.).

For the hidden layers the negative rewards do not play a role as you do not feed these rewards into the network - and even if: networks get randomly initialized such that the layer outcomes should not be all negative. Then the last layer is None-activated and negative output values can occur.

Hope his helps.
Simon

2 Likes

It does help, thank you very much! So not even post_fcnet_hiddens and post_fcnet_activation modify this behavior, right?

@fedetask, exactly. The layer with a Noneactivation is always attached to the network. If you need another layer at the end, you would need to write a custom model.

1 Like

Hello, sorry if I still bother you

I am using a custom model, but that final layer is still added somewhere by RLlib. I want to disable this behavior to handle q values directly in my custom model (I need to implement action masking so I need the output of my model to already be the masked Q values). How can I do this?