Training auxiliary models on RL agent output rollouts


I was hoping to train an auxiliary model (something like a VAE in this case) on the input from the environment and the output of the agent (e.g. predict its action distribution, etc.). Does anyone have any thoughts on an optimal way to do this? The auxiliary model does not govern the agent in any way (hence I feel a custom policy may not be applicable) but it does need artifacts from the agent. Do you think it would be wise to create a custom callback with an endpoint to a model’s training loop?


Hi @Samuel_Showalter,

The first thing I would try is creating a custom callback class and put it on the on_learn_on_batch method.


You could also take a look at how algos like IMPALA or APEX create these extra learner-threads in their execution plans:


These threads are then kicked off and keep receiving train data via a queue. You would have to also specify in your exec plan, how the data would be put into that queue, but also here, IMPALA shows you how to do that.

You can override the default exec-plan via:

MyNewTrainerClass = SomeExistingRLlibTrainer.with_updates(execution_plan=[your custom function])
1 Like