mlokos
September 10, 2021, 7:38am
1
Hi there,
I am using an appo method to train a model and have found something odd.
With a ‘num_gpus’ variable set to ‘0’ the metrics connected with policies (from info/learner/default_policy/…) are being logged. However, when a ‘num_gpus’ is set to ‘1’ those metrics are not logged.
Don’t know if this is connected with my setup or a logger module or even a training module.
Config:
cartpole-appo:
env: CartPole-v1
run: APPO
local_dir: /home/mlokos/appo_test
stop:
timesteps_total: 5000000
config:
framework: torch
vtrace: True
use_kl_loss: False
rollout_fragment_length: 50
train_batch_size: 750
num_workers: 8
broadcast_interval: 1
max_sample_requests_in_flight_per_worker: 1
num_envs_per_worker: 8
num_sgd_iter: 2
vf_loss_coeff: 1.0
clip_param: 0.3
# num_gpus: 0
num_gpus: 1
grad_clip: 10
model:
dim: 42
Run command:
rllib train -f cartpole-appo.yaml
Here are some metrics gathered from tensorboard (orange - with GPU / blue - without GPU).
It looks like that training with GPU is learning (the rewards are being better and better) but is it omitting the policies?
1 Like
mannyv
September 10, 2021, 1:29pm
2
Hi @mlokos ,
Welcome to the forum. I am not certain but have a hunch that this behavior you are seeing is related to this issue:
opened 12:12PM - 26 Aug 21 UTC
bug
triage
### What is the problem?
When multiple gpus are used, learner stats are gathe… red in the `learn_on_loaded_batch` method with a "tower_X" key before the stats:
https://github.com/ray-project/ray/blob/master/rllib/policy/torch_policy.py#L645
for example:
```
{'tower_0': {'learner_stats': {'cur_lr': 0.000495184, 'policy_loss': -40.517921447753906, 'entropy': 1.8380180597305298, 'entropy_coeff': 0.0005, 'var_gnorm': 17.741676330566406, 'vf_loss': 0.7066917419433594, 'vf_explained_var': array([0.5089742], dtype=float32), 'mean_rhos': 1.0025060176849365, 'std_rhos': 0.39850014448165894}}}
```
This 'tower_0' is not taken into account when `get_learner_stats()` is used:
https://github.com/ray-project/ray/blob/089dd9b94924cccb85ba1affccc4b0c8907f192c/rllib/execution/multi_gpu_learner_thread.py#L102
This causes the policy learner_stats to get dropped when GPUs are used.
## CPU does not drop these stats
This is different from when a single gpu/cpu is used, the `learn_on_loaded_batch` function will return:
```
{'learner_stats': {'cur_lr': 0.000495184, 'policy_loss': -40.517921447753906, 'entropy': 1.8380180597305298, 'entropy_coeff': 0.0005, 'var_gnorm': 17.741676330566406, 'vf_loss': 0.7066917419433594, 'vf_explained_var': array([0.5089742], dtype=float32), 'mean_rhos': 1.0025060176849365, 'std_rhos': 0.39850014448165894}
```
(note the lack of `tower_X` key)
Similar code can then extract the policy metrics which works!
https://github.com/ray-project/ray/blob/089dd9b94924cccb85ba1affccc4b0c8907f192c/rllib/execution/learner_thread.py#L80
*Ray version and other system information (Python version, TensorFlow version, OS):*
version: latest dev 2.0.0
python: 3.8
macosx + linux
torch + tensorflow
### Reproduction (REQUIRED)
Run anything with GPU learners (specifically in my case I'm using IMPALA)
If the code snippet cannot be run by itself, the issue will be closed with "needs-repro-script".
- [ ] I have verified my script runs in a clean environment and reproduces the issue.
- [x] I have verified the issue also occurs with the [latest wheels](https://docs.ray.io/en/master/installation.html).
@sven1977