Some RLlib algorithms support a LR annealing schedule via “lr_schedule”. Is there support for annealing “gamma”? If not, how should I go about implementing it? Where in the code should I start digging? Thanks!
Here is an example of how the LR is annealed.
From quick look through the code, it seems that gamma is not stored as a member variable of the policy but in the config. So I think there are two potential approaches.
- Create a GammaMixin following the example above and in it modify self.config[“gamma”].
2.You might also be able to use the on_learn_on_batch callback.
One thing I am not sure about is if these approaches will work if you are using an asynchronous training method like APPO, APEX, IMPALA …
There are a couple issues / pull requests open on these issues right now: