When using customize env,every steps need to return reward. But the situations is I dont know reward until reaching the end of game. so How can i use the end of game reward as every steps reward in customize env?
Thanks a lot.
Hi @Superhors ,
and welcome to the board, what about simply returning a reward of zero to the agent?
I think @Superhors wants to see the reward of each step via the Tensorboard.
in this case it might work when setting "batch_mode"="complete_episodes"
such that in each iteration a complete episode is considered having an end-of-epsiode reward. This should then be stored by rllib into the metrics that can be viewed in TensorBoard.
@Superhors Take a look at Sample Collection
Hi @Superhors,
There seems to be some confusion as to what you are asking.
Are you saying that:
1.) You have an environment that only has a single reward that is provided when it is done. Something similar to chess or checkers.
In this case you should follow @Lars_Simon_Zehnder’s suggestion and return 0 for all steps but the final one.
2.) You have an environment that needs to provide a reward on every step but you only know what reward to provide after it is done.
This case is a little more complicated. In order to do this correctly I think you need to use batch_mode=complete_episodes and use the postprocess_trajectory Callback to rewrite the reward history.
batch_mode: RLlib Sample Collection and Trajectory Views — Ray v1.8.0
callbacks: RLlib Training APIs — Ray v1.8.0