Training auxiliary models on RL agent output rollouts

Samuel_Showalter · August 2, 2021, 10:03pm

Hi,

I was hoping to train an auxiliary model (something like a VAE in this case) on the input from the environment and the output of the agent (e.g. predict its action distribution, etc.). Does anyone have any thoughts on an optimal way to do this? The auxiliary model does not govern the agent in any way (hence I feel a custom policy may not be applicable) but it does need artifacts from the agent. Do you think it would be wise to create a custom callback with an endpoint to a model’s training loop?

Thanks,
Sam

mannyv · August 3, 2021, 2:45am

Hi @Samuel_Showalter,

The first thing I would try is creating a custom callback class and put it on the on_learn_on_batch method.

https://docs.ray.io/en/master/rllib-training.html?highlight=Callbacks#callbacks-and-custom-metrics

sven1977 · August 3, 2021, 10:28pm

You could also take a look at how algos like IMPALA or APEX create these extra learner-threads in their execution plans:

e.g. ray.rllib.agents.impala.impala.py.

These threads are then kicked off and keep receiving train data via a queue. You would have to also specify in your exec plan, how the data would be put into that queue, but also here, IMPALA shows you how to do that.

You can override the default exec-plan via:

MyNewTrainerClass = SomeExistingRLlibTrainer.with_updates(execution_plan=[your custom function])

Topic		Replies	Views
Extracting and storing per step agent state from RLlib rollouts RLlib	3	237	July 23, 2021
How to add an extra model to a built-in Policy / Trainer? RLlib	1	230	June 27, 2022
Ensemble Learner with rule-based policies RLlib	1	286	January 12, 2022
Behavior Cloning through custom env RLlib	4	364	August 13, 2021
[rllib] Modify multi agent env reward mid training RLlib	7	1061	May 27, 2021

Training auxiliary models on RL agent output rollouts

Related Topics