I want to customize the neural network structure of the reinforcement learning agent.
And I found that the model config can be defined directly.
But I have a question, whether “use_lstm” and “fcnet_hiddens” can be used at the same time.
If I set the PPO agent model as this:
“model”:{
“fcnet_hiddens”: [256, 128],
“fcnet_activation”: “tanh”,
Welcome to ray and rllib. Yes it uses both together. It would look like this:
FC(256) → FC(128) → LSTM(256) → FC(num_actions)
There is also a second path from the LSTM for the value function. There is only one LSTM here not 2.:
LSTM(256) → FC(1)
One word on your layer sizes.This would work but it is a common convention that the Lstm should have the same number or fewer units than the previous layer not more. That setting would be lstm_cell_size. The max_seq_len means that backprop through time for the Lstm will be truncated every 20 environment steps.
Depends on what you mean. I am pretty sure that the policy that produces the logits used to choose an action can have an Lstm in it but I think you are right that the model used by thr Curiosity exploration module does not support RNNs.
It is also the case that Curiosity exploration can only be used if num_workers is 0.