How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Situation: I have a multi-agent env that runs inside a policy_client(in local mode) that reports to a policy_server. I adapted the cartpole examples to my needs and everything runs ok.
What works: Everything works fine except reward loging to the server.
Problem: I don’t understand how the policy_client handles multi-agent rewards. For get_action() i can pass a MultiAgentDict observation but log_returns() only takes a float value as reward input. When passing a reward dict no rewards show in tensorboard.
from rllib/env/policy_client.py
@PublicAPI
def log_returns(
self,
episode_id: str,
reward: float,
info: Union[EnvInfoDict, MultiAgentDict] = None,
multiagent_done_dict: Optional[MultiAgentDict] = None,
)
Question: How am i supposed to return a multi-agent reward dict to the server?