So for some reason using
on_episode_step() method for the custom callback class doesn’t work. It only ever returns the resetted observations (e.g. the initial observations each episode) and not the stepped forward observations unless I made some other mmistake. I very closely followed the custom metrics and callbacks example class too. I checked and inside my MultiAgentEnvironment and Environments it does work, as the observation is being updated in run.tune, so it must be some issue or design choice with
episode.last_observation_for(). I’m therefore not really sure how to extract observations for easy display.
Is it possible that
For the policy this should work. I am trying to extract the action probabilities via .logp and then plot them on a state grid. Unfortunately, not sure how to get something like 95% intervals without manually sampling.