Some RLlib algorithms support a LR annealing schedule via “lr_schedule”. Is there support for annealing “gamma”? If not, how should I go about implementing it? Where in the code should I start digging? Thanks!
@RickLan ,
Here is an example of how the LR is annealed.
]
for key in state_keys:
feed_dict[self._loss_input_dict[key]] = train_batch[key]
if state_keys:
feed_dict[self._seq_lens] = train_batch["seq_lens"]
return feed_dict
@DeveloperAPI
class LearningRateSchedule:
"""Mixin for TFPolicy that adds a learning rate schedule."""
@DeveloperAPI
def __init__(self, lr, lr_schedule):
self._lr_schedule = None
if lr_schedule is None:
self.cur_lr = tf1.get_variable(
"lr", initializer=lr, trainable=False)
else:
self._lr_schedule = PiecewiseSchedule(
From quick look through the code, it seems that gamma is not stored as a member variable of the policy but in the config. So I think there are two potential approaches.
Create a GammaMixin following the example above and in it modify self.config[“gamma”].
2.You might also be able to use the on_learn_on_batch callback.
One thing I am not sure about is if these approaches will work if you are using an asynchronous training method like APPO, APEX, IMPALA …
There are a couple issues / pull requests open on these issues right now:
opened 07:16AM - 20 May 21 UTC
P0
bug
release-blocker
rllib
Since ray 1.2, IMPALA (and any other tf algorithm that uses `entropy_coeff`) h… as slowed down due to a bug.
- The bug is caused by a tf-(static graph)-op being added to the graph each time we call `on_global_var_update` (defined inside the `EntropyCoeffSchedule` (tf) mixin class).
To reproduce:
`rllib train -f rllib/tuned_examples/impala/pong-impala-fast.yaml`
Runs much slower than in ray<=1.1.
### What is the problem?
*Ray version and other system information (Python version, TensorFlow version, OS):*
### Reproduction (REQUIRED)
Please provide a short code snippet (less than 50 lines if possible) that can be copy-pasted to reproduce the issue. The snippet should have **no external library dependencies** (i.e., use fake or mock data / environments):
If the code snippet cannot be run by itself, the issue will be closed with "needs-repro-script".
- [x] I have verified my script runs in a clean environment and reproduces the issue.
- [x] I have verified the issue also occurs with the [latest wheels](https://docs.ray.io/en/master/installation.html).
opened 03:43PM - 19 May 21 UTC
bug
triage
### What is the problem?
`cur_lr` isn't updated on local worker when `lr_sche… dule` is set in configuration. Visible impact `cur_lr` in training stats is not updating.
*Ray version and other system information (Python version, TensorFlow version, OS):*
- Ray: 1.3.0
- Tensorflow: 2.5.0
### Reproduction (REQUIRED)
```python
import ray
from ray.rllib import agents
ray.init()
config = {
'lr': 1e-4,
'lr_schedule': [
[0, 1e-4],
[1000, 1e-9],
],
'num_workers': 2,
'train_batch_size': 10,
'learning_starts': 10,
'timesteps_per_iteration': 10,
}
trainer = agents.dqn.ApexTrainer(env='CartPole-v0', config=config)
while True:
results = trainer.train()
print(results['iterations_since_restore'],
results['info']['learner']['default_policy']['cur_lr'])
```
Watch statistics printed:
```
1 9.999999747378752e-05
2 9.999999747378752e-05
3 9.999999747378752e-05
```
- [X] I have verified my script runs in a clean environment and reproduces the issue.
- [X] I have verified the issue also occurs with the [latest wheels](https://docs.ray.io/en/master/installation.html).