Seeking recommendations for implementing Dual Curriculum Design in RLlib

heng2j · March 3, 2023, 4:16pm

Thanks to the official rllib example custom_metrics_and_callbacks.py , I am able to gather the max and mean advantages by accessing the advantages with postprocessed_batch["advantages"] in the on_postprocess_trajectory() call back.

Now I have another straightforward question. Since * Robust PLR may switch train and eval during each episode based on the sample replay decision. To enable switch training and eval for all workers per episode, is by using policies["default_policy"].model.eval() or policies["default_policy"].model.train() should be sufficient enough for me to toggle training and eval for specific episodes?

Topic		Replies	Views
How to recompute the advantage in learning (ppo) RLlib	3	708	October 5, 2021
Issue with Running Experiments with Custom Gym Environment RLlib	4	497	June 13, 2022
Multi-objective RL RLlib	6	926	November 11, 2021
Unable to replicate original PPO performance RLlib	0	170	May 10, 2024
How to add an extra model to a built-in Policy / Trainer? RLlib	1	283	June 27, 2022

Seeking recommendations for implementing Dual Curriculum Design in RLlib

Related topics