How can i use the end of game reward as every steps reward?

Superhors · November 24, 2021, 3:03am

When using customize env，every steps need to return reward. But the situations is I dont know reward until reaching the end of game. so How can i use the end of game reward as every steps reward in customize env?
Thanks a lot.

Lars_Simon_Zehnder · November 24, 2021, 8:52am

Hi @Superhors ,

and welcome to the board, what about simply returning a reward of zero to the agent?

Roller44 · November 26, 2021, 1:33am

I think @Superhors wants to see the reward of each step via the Tensorboard.

Lars_Simon_Zehnder · November 26, 2021, 12:04pm

@Roller44 ,

in this case it might work when setting "batch_mode"="complete_episodes" such that in each iteration a complete episode is considered having an end-of-epsiode reward. This should then be stored by rllib into the metrics that can be viewed in TensorBoard.

@Superhors Take a look at Sample Collection

mannyv · November 27, 2021, 9:16pm

Hi @Superhors,

There seems to be some confusion as to what you are asking.

Are you saying that:

1.) You have an environment that only has a single reward that is provided when it is done. Something similar to chess or checkers.

In this case you should follow @Lars_Simon_Zehnder’s suggestion and return 0 for all steps but the final one.

2.) You have an environment that needs to provide a reward on every step but you only know what reward to provide after it is done.

This case is a little more complicated. In order to do this correctly I think you need to use batch_mode=complete_episodes and use the postprocess_trajectory Callback to rewrite the reward history.

batch_mode: RLlib Sample Collection and Trajectory Views — Ray v1.8.0

callbacks: RLlib Training APIs — Ray v1.8.0

Topic		Replies	Views
Multi-Agent cyclic games with paused agents RLlib	2	459	September 27, 2021
[rllib] Modify multi agent env reward mid training RLlib	7	1302	May 27, 2021
Post process trajectory with full episode RLlib	1	405	October 17, 2023
Understanding agent_timesteps_total RLlib	2	574	February 3, 2023
How to distribute the final reward among agents in a fully-cooperative turn-taking environmet? RLlib	4	280	October 28, 2021

How can i use the end of game reward as every steps reward?

Related topics