How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Situation: I have a multi-agent env that runs inside a policy_client(in local mode) that reports to a policy_server. I adapted the cartpole examples to my needs and everything runs ok.
What works: Everything works fine except reward loging to the server.
Problem: I don’t understand how the policy_client handles multi-agent rewards. For get_action() i can pass a MultiAgentDict observation but log_returns() only takes a float value as reward input. When passing a reward dict no rewards show in tensorboard.
@PublicAPI def log_returns( self, episode_id: str, reward: float, info: Union[EnvInfoDict, MultiAgentDict] = None, multiagent_done_dict: Optional[MultiAgentDict] = None, )
Question: How am i supposed to return a multi-agent reward dict to the server?