Only one policy metrics reported in the two trainer workflow

faten · November 26, 2020, 6:57am

Hi,

We are currently using a two trainer workflow with a different policy for each. However, we are noticing that at the end of the iteration, we only get the learning info of only one policy. I just ran the two_trainer_workflow.py example and I am seeing the same behaviour.
At the end of the iteration, we notice this:

Result for PPO_DQN_MultiAgent_multi_agent_cartpole_c241c_00000:
  ...
  info:
    agent_steps_trained_DQN: 64
    agent_steps_trained_PPO: 1440
    last_target_update_ts: 1000
    learner:
      dqn_policy:
        cur_lr: 0.0005000000237487257
        max_q: 1.2726243734359741
        mean_q: 0.025461304932832718
        mean_td_error: -1.1207715272903442
        min_q: -0.33799782395362854
        model: {}
    num_steps_sampled: 1000
    num_steps_trained: 1064
    num_target_updates: 1
  iterations_since_restore: 1
  node_ip: 192.168.1.7
...

What we notice above is that we only get the learner metrics of dqn_policy and not the ppo_policy as well.
The issue with this is that I can follow the metrics of only one policy in the tensorboard.

I tried to debug, with no major results. Is there a way to have the reporting of both policies, taking this example if you’d like to reproduce the output?

Thanks

arturn · May 1, 2021, 10:15am

Hi faten,
Your post is a little older, but maybe you are still interested in a solution, or maybe my reply will help someone else.

In the script that you linked, you will find that a couple of concurrent operations merge at the end of the workflow:

train_op = Concurrently(
        [ppo_train_op, dqn_train_op], mode="async", output_indexes=[1])

The dosctring of the Concurrently operator tells you that the parameter output_indexes lets you choose which output of the two that are given by your two separate training routines, PPO and DQN, is chosen as the output of the overall workflow. This output decides which metric makes it into the metrics reporting.

mannyv · May 1, 2021, 10:49am

@arturn,

Nice find! Perhaps the example should be updated to include both trainers in the list to output indexes.

arturn · May 1, 2021, 2:16pm

I have tried to set output_indexes=[1, 2], which I did not expect to work out of the box but it resulted in a rather weird error that I will report back on.

faten · May 5, 2021, 10:32am

Hi @arturn

Thanks for taking a look at this. It’s been a while indeed but I remember that I had tried output_indexes=[0, 1] but I believe it didn’t work or kept giving me the same output. I will see if I can rerun this and let you know if there was any improvement.

arturn · July 14, 2021, 3:20pm

This works as expected now.

Topic		Replies	Views
Multi-Agent Training with Different Algorithms RLlib	24	3421	October 11, 2022
Evaluating multiple policies in multiagent RLlib	4	515	July 6, 2021
Reporting Custom Metrics From Policy_Clients RLlib	0	257	November 12, 2021
Multi Agent Policies and Checkpoint RLlib	0	283	July 1, 2021
Multi-agent Training with two Policies throwing model interfacing error RLlib	2	808	October 7, 2021

Only one policy metrics reported in the two trainer workflow

Related topics