Does ReLU activation interfere with DQN?

fedetask · March 24, 2022, 10:17am

How severe does this issue affect your experience of using Ray? Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I have an environment where all rewards are negative and I’m using the rllib DQN implementation. In the model config, I’m setting 'fcnet_activation' = 'relu'. What I would expect is that since all rewards are negative, the Q values would not be able to converge since the ReLU activation function does not allow for negative outputs.

However, the model seems to work just fine.

To investigate into this more, I tried adding 'post_fcnet_hiddens': [256] and 'post_fcnet_activation': tune.grid_search(None, 'relu'). Again I would expect the algorithm to work better with None rather than with relu, but the two seem to work equally good.

Is RLlib automatically modifying the activation of the last layer when DQN is used? Or both 'fcnet_activation' and 'post_fcnet_activation' are not applied to the final layer outputs from which DQN gets the Q values?

Lars_Simon_Zehnder · March 24, 2022, 10:08pm

Hi @fedetask ,

as far as I can only guess here, you use the default DQN - not touching most of its configuration parameters especially the ones it inherits silently from the MODEL_DEFAULTS. The DQN agent build its networks (Q- and Target-network) by calling for each ModelCatalog.get_model_v2() (with num_outputs - becomes important shortly).
Calling this function activates RLlib’s semi-automatic model choice dependent on input and output shape (obs and num_outputs). If you look then into the different default model classes you will see something like this:

if num_outputs:
      logits_out = tf.keras.layers.Dense(
          num_outputs,
          name="fc_out",
          activation=None,
          kernel_initializer=normc_initializer(0.01),
      )(last_layer)

i.e. the last layer is always None-activated. Now, I do not know how your observation space looks like, but you see will this in many different default networks (Fully Connected Network, CNN, etc.).

For the hidden layers the negative rewards do not play a role as you do not feed these rewards into the network - and even if: networks get randomly initialized such that the layer outcomes should not be all negative. Then the last layer is None-activated and negative output values can occur.

Hope his helps.
Simon

fedetask · March 28, 2022, 10:12am

It does help, thank you very much! So not even post_fcnet_hiddens and post_fcnet_activation modify this behavior, right?

Lars_Simon_Zehnder · March 28, 2022, 9:32pm

@fedetask, exactly. The layer with a Noneactivation is always attached to the network. If you need another layer at the end, you would need to write a custom model.

fedetask · April 29, 2022, 12:33pm

Hello, sorry if I still bother you

I am using a custom model, but that final layer is still added somewhere by RLlib. I want to disable this behavior to handle q values directly in my custom model (I need to implement action masking so I need the output of my model to already be the masked Q values). How can I do this?

Topic		Replies	Views
RLlib rollout vs stepping the model manually: different outcomes RLlib	3	599	October 27, 2021
Saved ONNX model using DQN dueling policy RLlib	4	489	March 25, 2023
[Contribution] [Help needed] Implementing easy action masking for distributional and dueling DQN RLlib	2	480	February 23, 2023
Cannot understand how to create custom model for DQN RLlib	2	1492	April 29, 2022
Running APEX-DQN with custom model RLlib	1	594	March 1, 2022

Does ReLU activation interfere with DQN?

Related topics