PolicyClient should be agent-done smart

On the client side of a client-server run, we need to have client.get_action(obs). This will generate actions for all agents present in the observation, even if those agents are done. This is different from other modes of running rllib, where done agents do not continue producing actions.

The work around is very simple. I just locally keep track of which agents are done in the episode and filter those out of the action before passing it to the env.step. But it would be nice if the behavior was consistent with other modes of rllib. This would probably require giving the client the done information as well, like client.get_action(obs, done). This gets kinda sticky when we think about including the rewards as well instead of as a separate log_return function.

1 Like

Great point @rusu24edward , would you be able to provide a PR where you try this out?