GPU does not use policy metrics in appo training?

Hi there,

I am using an appo method to train a model and have found something odd.
With a ‘num_gpus’ variable set to ‘0’ the metrics connected with policies (from info/learner/default_policy/…) are being logged. However, when a ‘num_gpus’ is set to ‘1’ those metrics are not logged.
Don’t know if this is connected with my setup or a logger module or even a training module.

Config:

cartpole-appo:
    env: CartPole-v1
    run: APPO
    local_dir: /home/mlokos/appo_test
    stop:
        timesteps_total: 5000000
    config:
        framework: torch
        vtrace: True
        use_kl_loss: False
        rollout_fragment_length: 50
        train_batch_size: 750
        num_workers: 8
        broadcast_interval: 1
        max_sample_requests_in_flight_per_worker: 1
        num_envs_per_worker: 8
        num_sgd_iter: 2
        vf_loss_coeff: 1.0
        clip_param: 0.3
        # num_gpus: 0
        num_gpus: 1
        grad_clip: 10
        model:
          dim: 42

Run command:

rllib train -f cartpole-appo.yaml

Here are some metrics gathered from tensorboard (orange - with GPU / blue - without GPU).
It looks like that training with GPU is learning (the rewards are being better and better) but is it omitting the policies?

1 Like

Hi @mlokos,

Welcome to the forum. I am not certain but have a hunch that this behavior you are seeing is related to this issue: