I am using an appo method to train a model and have found something odd.
With a ‘num_gpus’ variable set to ‘0’ the metrics connected with policies (from info/learner/default_policy/…) are being logged. However, when a ‘num_gpus’ is set to ‘1’ those metrics are not logged.
Don’t know if this is connected with my setup or a logger module or even a training module.
cartpole-appo: env: CartPole-v1 run: APPO local_dir: /home/mlokos/appo_test stop: timesteps_total: 5000000 config: framework: torch vtrace: True use_kl_loss: False rollout_fragment_length: 50 train_batch_size: 750 num_workers: 8 broadcast_interval: 1 max_sample_requests_in_flight_per_worker: 1 num_envs_per_worker: 8 num_sgd_iter: 2 vf_loss_coeff: 1.0 clip_param: 0.3 # num_gpus: 0 num_gpus: 1 grad_clip: 10 model: dim: 42
rllib train -f cartpole-appo.yaml
Here are some metrics gathered from tensorboard (orange - with GPU / blue - without GPU).
It looks like that training with GPU is learning (the rewards are being better and better) but is it omitting the policies?