Hi all, I have a simple question. I’m training Pong-v0 GYM env with a PPO trainer, and analyzing the underlaying keras neural network (a CNN, visionnet.py), I have seen two outputs, one of size 6 and another one of size 1. I have seen that these values are the policy and value netowrk outputs. I’d like to know which one of all these outputs determine the next action to take (for example when you call agent.compute_action() you provide an env observation and get the next action to take, so how is this calculated?)
Thanks in advance!