Log multi agent rewards from policy_client

Blubberblub · April 1, 2022, 9:21am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Situation: I have a multi-agent env that runs inside a policy_client(in local mode) that reports to a policy_server. I adapted the cartpole examples to my needs and everything runs ok.
What works: Everything works fine except reward loging to the server.
Problem: I don’t understand how the policy_client handles multi-agent rewards. For get_action() i can pass a MultiAgentDict observation but log_returns() only takes a float value as reward input. When passing a reward dict no rewards show in tensorboard.

from rllib/env/policy_client.py

    @PublicAPI
    def log_returns(
        self,
        episode_id: str,
        reward: float,
        info: Union[EnvInfoDict, MultiAgentDict] = None,
        multiagent_done_dict: Optional[MultiAgentDict] = None,
    )

Question: How am i supposed to return a multi-agent reward dict to the server?

Blubberblub · April 7, 2022, 12:09pm

Ok the basic cartpole client/server example also doesn’t make rewards and dones(at least not correctly) visible in tensorboard…
I had an additional look into the policy_client source code and here is what i saw:

The policy client seems to have an internal environment (self.env) that wraps a RandomEnv or RandomMultiAgentEnv in an ExternalEnv/ExternalMultiAgent env, if you leave the env in the server config empty (the usual case)
The log_returns function from the policy_client then only calls the log_returns function of this ExternalMultiAgentEnv, if you do local and not remote

Some more in detail explanation on how ExternalEnv and MultiAgentExternalEnv work from there and interact with the server would be really helpful!

Topic		Replies	Views
How to share obsrvations and rewards in Multi-Agent ExternallEnv? RLlib	2	430	July 27, 2022
Reporting Custom Metrics From Policy_Clients RLlib	0	258	November 12, 2021
'client.end_episode()' don't make any difference RLlib	3	656	July 26, 2022
Rewards leaks to different multi agent policies in training only Configure Algorithm, Training, Evaluation, Scaling	3	161	May 31, 2024
How to separate rewards between agent in adversarial multi agent env RLlib	3	480	August 16, 2022

Log multi agent rewards from policy_client

Related topics