So, I have a rollout function that returns the following structure:
Rollout_results(info=infos,
states=states,
values=values,
actions=actions,
rewards=rewards,
win=win,
logps=logps,
entropies=entropies,
dones=dones,
net_info=network_infos)
The majority of this info is useful in downstream calculations (e.g., computing GAE)
However, so that I don’t duplicate work that’s already done here in rllib, I want to switch to using the ‘trainer.evaluate()’ functions instead since that will gracefully handle cases like single-agent and multi-agent under the hood.
Is there a way to get all this info out of the trainer.evaluate
function?