How can i use the end of game reward as every steps reward?

When using customize env,every steps need to return reward. But the situations is I dont know reward until reaching the end of game. so How can i use the end of game reward as every steps reward in customize env?
Thanks a lot.

Hi @Superhors ,

and welcome to the board, what about simply returning a reward of zero to the agent?

I think @Superhors wants to see the reward of each step via the Tensorboard.

1 Like

@Roller44 ,

in this case it might work when setting "batch_mode"="complete_episodes" such that in each iteration a complete episode is considered having an end-of-epsiode reward. This should then be stored by rllib into the metrics that can be viewed in TensorBoard.

@Superhors Take a look at Sample Collection

2 Likes

Hi @Superhors,

There seems to be some confusion as to what you are asking.

Are you saying that:

1.) You have an environment that only has a single reward that is provided when it is done. Something similar to chess or checkers.

In this case you should follow @Lars_Simon_Zehnder’s suggestion and return 0 for all steps but the final one.

2.) You have an environment that needs to provide a reward on every step but you only know what reward to provide after it is done.

This case is a little more complicated. In order to do this correctly I think you need to use batch_mode=complete_episodes and use the postprocess_trajectory Callback to rewrite the reward history.

batch_mode: RLlib Sample Collection and Trajectory Views — Ray v1.8.0

callbacks: RLlib Training APIs — Ray v1.8.0

3 Likes