PolicyClient should be agent-done smart

rusu24edward · December 28, 2021, 8:49pm

On the client side of a client-server run, we need to have client.get_action(obs). This will generate actions for all agents present in the observation, even if those agents are done. This is different from other modes of running rllib, where done agents do not continue producing actions.

The work around is very simple. I just locally keep track of which agents are done in the episode and filter those out of the action before passing it to the env.step. But it would be nice if the behavior was consistent with other modes of rllib. This would probably require giving the client the done information as well, like client.get_action(obs, done). This gets kinda sticky when we think about including the rewards as well instead of as a separate log_return function.

sven1977 · January 12, 2022, 3:20pm

Great point @rusu24edward , would you be able to provide a PR where you try this out?

Topic		Replies	Views
Best practice for training on policy and off policy action together? RLlib	4	349	September 27, 2021
Best practice for using `get_action` and `log_action` together? RLlib	1	208	August 19, 2021
My RLlib implementation seems to compute random actions RLlib	4	918	February 15, 2022
Log multi agent rewards from policy_client RLlib	1	361	April 7, 2022
Alert `Policy_Client`s when `Policy_Server` completes an epoch RLlib	2	225	March 30, 2023

PolicyClient should be agent-done smart

Related topics