Log multi agent rewards from policy_client

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Situation: I have a multi-agent env that runs inside a policy_client(in local mode) that reports to a policy_server. I adapted the cartpole examples to my needs and everything runs ok.
What works: Everything works fine except reward loging to the server.
Problem: I don’t understand how the policy_client handles multi-agent rewards. For get_action() i can pass a MultiAgentDict observation but log_returns() only takes a float value as reward input. When passing a reward dict no rewards show in tensorboard.

from rllib/env/policy_client.py

    def log_returns(
        episode_id: str,
        reward: float,
        info: Union[EnvInfoDict, MultiAgentDict] = None,
        multiagent_done_dict: Optional[MultiAgentDict] = None,

Question: How am i supposed to return a multi-agent reward dict to the server?

Ok the basic cartpole client/server example also doesn’t make rewards and dones(at least not correctly) visible in tensorboard…
I had an additional look into the policy_client source code and here is what i saw:

  1. The policy client seems to have an internal environment (self.env) that wraps a RandomEnv or RandomMultiAgentEnv in an ExternalEnv/ExternalMultiAgent env, if you leave the env in the server config empty (the usual case)
  2. The log_returns function from the policy_client then only calls the log_returns function of this ExternalMultiAgentEnv, if you do local and not remote

Some more in detail explanation on how ExternalEnv and MultiAgentExternalEnv work from there and interact with the server would be really helpful!