GPU does not use policy metrics in appo training?

mlokos · September 10, 2021, 7:38am

Hi there,

I am using an appo method to train a model and have found something odd.
With a ‘num_gpus’ variable set to ‘0’ the metrics connected with policies (from info/learner/default_policy/…) are being logged. However, when a ‘num_gpus’ is set to ‘1’ those metrics are not logged.
Don’t know if this is connected with my setup or a logger module or even a training module.

Config:

cartpole-appo:
    env: CartPole-v1
    run: APPO
    local_dir: /home/mlokos/appo_test
    stop:
        timesteps_total: 5000000
    config:
        framework: torch
        vtrace: True
        use_kl_loss: False
        rollout_fragment_length: 50
        train_batch_size: 750
        num_workers: 8
        broadcast_interval: 1
        max_sample_requests_in_flight_per_worker: 1
        num_envs_per_worker: 8
        num_sgd_iter: 2
        vf_loss_coeff: 1.0
        clip_param: 0.3
        # num_gpus: 0
        num_gpus: 1
        grad_clip: 10
        model:
          dim: 42

Run command:

rllib train -f cartpole-appo.yaml

Here are some metrics gathered from tensorboard (orange - with GPU / blue - without GPU).
It looks like that training with GPU is learning (the rewards are being better and better) but is it omitting the policies?

mannyv · September 10, 2021, 1:29pm

Hi @mlokos,

Welcome to the forum. I am not certain but have a hunch that this behavior you are seeing is related to this issue:

github.com/ray-project/ray

[rllib] Policy `learner_stats` get dropped when multi_gpu_learner_thread.py is used (in GPU and multi-GPU use cases).

opened 12:12PM - 26 Aug 21 UTC

Bam4d

bug triage

### What is the problem? When multiple gpus are used, learner stats are gathe…red in the `learn_on_loaded_batch` method with a "tower_X" key before the stats: https://github.com/ray-project/ray/blob/master/rllib/policy/torch_policy.py#L645 for example: ``` {'tower_0': {'learner_stats': {'cur_lr': 0.000495184, 'policy_loss': -40.517921447753906, 'entropy': 1.8380180597305298, 'entropy_coeff': 0.0005, 'var_gnorm': 17.741676330566406, 'vf_loss': 0.7066917419433594, 'vf_explained_var': array([0.5089742], dtype=float32), 'mean_rhos': 1.0025060176849365, 'std_rhos': 0.39850014448165894}}} ``` This 'tower_0' is not taken into account when `get_learner_stats()` is used: https://github.com/ray-project/ray/blob/089dd9b94924cccb85ba1affccc4b0c8907f192c/rllib/execution/multi_gpu_learner_thread.py#L102 This causes the policy learner_stats to get dropped when GPUs are used. ## CPU does not drop these stats This is different from when a single gpu/cpu is used, the `learn_on_loaded_batch` function will return: ``` {'learner_stats': {'cur_lr': 0.000495184, 'policy_loss': -40.517921447753906, 'entropy': 1.8380180597305298, 'entropy_coeff': 0.0005, 'var_gnorm': 17.741676330566406, 'vf_loss': 0.7066917419433594, 'vf_explained_var': array([0.5089742], dtype=float32), 'mean_rhos': 1.0025060176849365, 'std_rhos': 0.39850014448165894} ``` (note the lack of `tower_X` key) Similar code can then extract the policy metrics which works! https://github.com/ray-project/ray/blob/089dd9b94924cccb85ba1affccc4b0c8907f192c/rllib/execution/learner_thread.py#L80 *Ray version and other system information (Python version, TensorFlow version, OS):* version: latest dev 2.0.0 python: 3.8 macosx + linux torch + tensorflow ### Reproduction (REQUIRED) Run anything with GPU learners (specifically in my case I'm using IMPALA) If the code snippet cannot be run by itself, the issue will be closed with "needs-repro-script". - [ ] I have verified my script runs in a clean environment and reproduces the issue. - [x] I have verified the issue also occurs with the [latest wheels](https://docs.ray.io/en/master/installation.html). @sven1977

Topic		Replies	Views
[RLlib] "target_q_model" not using GPU for custom model RLlib	2	424	August 23, 2021
Training and inference ONLY using GPUs and no CPUs RLlib	7	1864	April 12, 2021
Suprisingly low GPU usage rate in RlLib Configure Algorithm, Training, Evaluation, Scaling	3	225	October 1, 2024
GPUs not detected RLlib	7	4342	February 21, 2023
GPU memory allocation exceeding configuration RLlib	2	792	August 25, 2021

GPU does not use policy metrics in appo training?

Related topics