Support for annealing gamma

Some RLlib algorithms support a LR annealing schedule via “lr_schedule”. Is there support for annealing “gamma”? If not, how should I go about implementing it? Where in the code should I start digging? Thanks!

I just saw this: [RLlib] updating batch_size or similar while training - #2 by sven1977

@RickLan,

Here is an example of how the LR is annealed.

From quick look through the code, it seems that gamma is not stored as a member variable of the policy but in the config. So I think there are two potential approaches.

  1. Create a GammaMixin following the example above and in it modify self.config[“gamma”].
    2.You might also be able to use the on_learn_on_batch callback.

One thing I am not sure about is if these approaches will work if you are using an asynchronous training method like APPO, APEX, IMPALA …

There are a couple issues / pull requests open on these issues right now: