On the client side of a client-server run, we need to have
client.get_action(obs). This will generate actions for all agents present in the observation, even if those agents are done. This is different from other modes of running rllib, where done agents do not continue producing actions.
The work around is very simple. I just locally keep track of which agents are done in the episode and filter those out of the action before passing it to the
env.step. But it would be nice if the behavior was consistent with other modes of rllib. This would probably require giving the client the done information as well, like
client.get_action(obs, done). This gets kinda sticky when we think about including the rewards as well instead of as a separate