Hi @JeanT,
The episode reward is the sum of all the rewards for each timestep in an episode. Yes, you could think of it as discount=1.0.
The mean is taken over the number of episodes not timesteps. The number of episodes is the number of new episodes sampled during the rollout phase or evaluation if it is an evaluation metric.