Use a custom encoder before Q model and train it with custom loss

fedetask · October 27, 2022, 4:38pm

High: It blocks me to complete my task.

My environment is partially observable. I have access to its ground truth state at training time, but I do not have it at test time. Therefore, I want to train a model that predicts the environment state given a stack of the last n observations.

My RL model would be therefore made of 3 components: an encoder, a decoder, and the Q values head.

At training time, the encoder takes the last n observations and encodes them into a k-dimensional vector z. The decoder reconstructs the ground truth state and is trained with a MSE loss between the actual ground truth state and the predicted one. The Q values head takes as input the vector z and is trained with the common RLlib losses.
At test time, we throw away the decoder, and just use the encoder and Q values head to select actions.

The n past observations are already included in the observation that the agent receives along with the current one, so that’s not an issue.

What can I do to have this custom encoder and the corresponding MSE loss? And train it together with the Q values head?

Topic		Replies	Views
Cannot understand how to create custom model for DQN RLlib	2	1223	April 29, 2022
Use encoder of other agents models in agent forward RLlib	3	338	June 17, 2022
How to use own optimizer for custom_loss_model example RLlib	0	368	April 12, 2021
Custom LSTM Model for R2D2 RLlib	0	292	December 12, 2021
Offline rl training with custom action masking model and episodic offline data Offline RL	0	43	March 20, 2024

Use a custom encoder before Q model and train it with custom loss

Related Topics