Support for annealing gamma

RickLan · May 20, 2021, 5:20am

Some RLlib algorithms support a LR annealing schedule via “lr_schedule”. Is there support for annealing “gamma”? If not, how should I go about implementing it? Where in the code should I start digging? Thanks!

RickLan · May 20, 2021, 7:06am

I just saw this: [RLlib] updating batch_size or similar while training - #2 by sven1977

mannyv · May 20, 2021, 2:21pm

@RickLan,

Here is an example of how the LR is annealed.

github.com

ray-project/ray/blob/2303851c3c234a261e8791b778e59e71f8235260/rllib/policy/tf_policy.py#L944

    
      
                  ]
                  for key in state_keys:
                      feed_dict[self._loss_input_dict[key]] = train_batch[key]
                  if state_keys:
                      feed_dict[self._seq_lens] = train_batch["seq_lens"]
          
          
        return feed_dict
          
          

          
@DeveloperAPI
          class LearningRateSchedule:
              """Mixin for TFPolicy that adds a learning rate schedule."""
          
          
    @DeveloperAPI
              def __init__(self, lr, lr_schedule):
                  self._lr_schedule = None
                  if lr_schedule is None:
                      self.cur_lr = tf1.get_variable(
                          "lr", initializer=lr, trainable=False)
                  else:
                      self._lr_schedule = PiecewiseSchedule(

From quick look through the code, it seems that gamma is not stored as a member variable of the policy but in the config. So I think there are two potential approaches.

Create a GammaMixin following the example above and in it modify self.config[“gamma”].
2.You might also be able to use the on_learn_on_batch callback.

One thing I am not sure about is if these approaches will work if you are using an asynchronous training method like APPO, APEX, IMPALA …

There are a couple issues / pull requests open on these issues right now:

github.com/ray-project/ray

[RLlib] IMPALA (any tf algo using entropy coeff) slowdown observed

opened 07:16AM - 20 May 21 UTC

sven1977

P0 bug release-blocker rllib

Since ray 1.2, IMPALA (and any other tf algorithm that uses `entropy_coeff`) h…as slowed down due to a bug. - The bug is caused by a tf-(static graph)-op being added to the graph each time we call `on_global_var_update` (defined inside the `EntropyCoeffSchedule` (tf) mixin class). To reproduce: `rllib train -f rllib/tuned_examples/impala/pong-impala-fast.yaml` Runs much slower than in ray<=1.1. ### What is the problem? *Ray version and other system information (Python version, TensorFlow version, OS):* ### Reproduction (REQUIRED) Please provide a short code snippet (less than 50 lines if possible) that can be copy-pasted to reproduce the issue. The snippet should have **no external library dependencies** (i.e., use fake or mock data / environments): If the code snippet cannot be run by itself, the issue will be closed with "needs-repro-script". - [x] I have verified my script runs in a clean environment and reproduces the issue. - [x] I have verified the issue also occurs with the [latest wheels](https://docs.ray.io/en/master/installation.html).

github.com/ray-project/ray

[rllib] cur_lr not updated on local worker when LearningRateSchedule used on APEX DQN

opened 03:43PM - 19 May 21 UTC

antoine-galataud

bug triage

### What is the problem? `cur_lr` isn't updated on local worker when `lr_sche…dule` is set in configuration. Visible impact `cur_lr` in training stats is not updating. *Ray version and other system information (Python version, TensorFlow version, OS):* - Ray: 1.3.0 - Tensorflow: 2.5.0 ### Reproduction (REQUIRED) ```python import ray from ray.rllib import agents ray.init() config = { 'lr': 1e-4, 'lr_schedule': [ [0, 1e-4], [1000, 1e-9], ], 'num_workers': 2, 'train_batch_size': 10, 'learning_starts': 10, 'timesteps_per_iteration': 10, } trainer = agents.dqn.ApexTrainer(env='CartPole-v0', config=config) while True: results = trainer.train() print(results['iterations_since_restore'], results['info']['learner']['default_policy']['cur_lr']) ``` Watch statistics printed: ``` 1 9.999999747378752e-05 2 9.999999747378752e-05 3 9.999999747378752e-05 ``` - [X] I have verified my script runs in a clean environment and reproduces the issue. - [X] I have verified the issue also occurs with the [latest wheels](https://docs.ray.io/en/master/installation.html).

Topic		Replies	Views
Learning rate annealing with tune.run() RLlib	6	1212	April 27, 2021
Change learning rete for DQN RLlib	6	523	February 25, 2022
Change params of agents Ray Tune	3	350	May 28, 2022
Ray Tune: exponential learning rate schedule on HyperOpt Ray Tune	6	1177	May 28, 2023
Very slow gradient descent on remote workers RLlib	14	2495	June 8, 2021

Support for annealing gamma

Related topics