Hi,
We are currently using a two trainer workflow with a different policy for each. However, we are noticing that at the end of the iteration, we only get the learning info of only one policy. I just ran the two_trainer_workflow.py example and I am seeing the same behaviour.
At the end of the iteration, we notice this:
Result for PPO_DQN_MultiAgent_multi_agent_cartpole_c241c_00000:
...
info:
agent_steps_trained_DQN: 64
agent_steps_trained_PPO: 1440
last_target_update_ts: 1000
learner:
dqn_policy:
cur_lr: 0.0005000000237487257
max_q: 1.2726243734359741
mean_q: 0.025461304932832718
mean_td_error: -1.1207715272903442
min_q: -0.33799782395362854
model: {}
num_steps_sampled: 1000
num_steps_trained: 1064
num_target_updates: 1
iterations_since_restore: 1
node_ip: 192.168.1.7
...
What we notice above is that we only get the learner metrics of dqn_policy
and not the ppo_policy
as well.
The issue with this is that I can follow the metrics of only one policy in the tensorboard.
I tried to debug, with no major results. Is there a way to have the reporting of both policies, taking this example if you’d like to reproduce the output?
Thanks