How to share obsrvations and rewards in Multi-Agent ExternallEnv?

hermmanhender · July 26, 2022, 11:59pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hi everyone!
I am trying to implement Multiagent environment in a Server-Client configuration. I don’t understand how to send the observations and rewards to the diferents agents.

In my case I have five agents that I configured in the server side as follow:

config = {
        # === Settings for Multi-Agent Environments ===
        "multiagent": {
            # Map of type MultiAgentPolicyConfigDict from policy ids to tuples
            # of (policy_cls, obs_space, act_space, config). This defines the
            # observation and action spaces of the policies and any extra config.
            "policies": {
                "agent_1": PolicySpec(
                    policy_class = None,  # infer automatically from Algorithm
                    observation_space = spaces.Box(float("-inf"), float("inf"), (9,)),
                    action_space = spaces.Discrete(10), # son 5 accionables binarios y su combinatoria es 2^5
                    config={"gamma": 0.99, "lr": 0.001}
                ),
                "agent_2": PolicySpec(
                    policy_class = None,  # infer automatically from Algorithm
                    observation_space = spaces.Box(float("-inf"), float("inf"), (9,)),
                    action_space = spaces.Discrete(10), # son 5 accionables binarios y su combinatoria es 2^5
                    config={"gamma": 0.99, "lr": 0.001}
                ),
                "agent_3": PolicySpec(
                    policy_class = None,  # infer automatically from Algorithm
                    observation_space = spaces.Box(float("-inf"), float("inf"), (7,)),
                    action_space = spaces.Discrete(2), # son 5 accionables binarios y su combinatoria es 2^5
                    config={"gamma": 0.99, "lr": 0.001}
                ),
                "agent_4": PolicySpec(
                    policy_class = None,  # infer automatically from Algorithm
                    observation_space = spaces.Box(float("-inf"), float("inf"), (7,)),
                    action_space = spaces.Discrete(2), # son 5 accionables binarios y su combinatoria es 2^5
                    config={"gamma": 0.99, "lr": 0.001}
                ),
                "agent_5": PolicySpec(
                    policy_class = None,  # infer automatically from Algorithm
                    observation_space = spaces.Box(float("-inf"), float("inf"), (4,)),
                    action_space = spaces.Discrete(2), # son 5 accionables binarios y su combinatoria es 2^5
                    config={"gamma": 0.99, "lr": 0.001}
                )
            },
            'policy_mapping_fn': None,
        }
    }

in the client side I need to send the observations to ask for actions to the server and log rewards and new observations. How can I do that?
I tryed whit the following configuration to send observations and rewards:

# Get observations of the External Environment for each agent
observations= {
      "agent_1": HVAC_H_obs,
      "agent_2": HVAC_C_obs,
      "agent_3": NW_obs,
      "agent_4": SW_obs,
      "agent_5": NWB_obs
      }

# Ask for actions to the server
action = client.get_action(eid, observations)

# After aplied actions in the environment, rewards are calculated with the new observations
rewards= {
      "agent_1": HVAC_H_rew,
      "agent_2": HVAC_C_rew,
      "agent_3": NW_rew,
      "agent_4": SW_rew,
      "agent_5": NWB_rew
      }

# The rewards are log in the server to lern about it
client.log_returns(eid, rewards, {}, {})

And I have the following error:

-- Raw obs from env: { '1': { 'agent_1': [ 1,
                     0,
                     0.0,
                     0.0,
                     23.766666666666666,
                     20.72565539572239,
                     0.2833333333333333,
                     178.66666666666666,
                     53.515461867926604],
         'agent_2': [ 1,
                     0,
                     0.0,
                     0.0,
                     23.766666666666666,
                     20.72565539572239,
                     0.2833333333333333,
                     178.66666666666666,
                     53.515461867926604],
         'agent_3': [ 1,
                          0,
                          23.766666666666666,
                          20.72565539572239,
                          0.2833333333333333,
                          178.66666666666666,
                          53.515461867926604],
         'agent_5': [ 1,
                               0,
                               23.766666666666666,
                               20.72565539572239],
         'agent_4': [ 1,
                          0,
                          23.766666666666666,
                          20.72565539572239,
                          0.2833333333333333,
                          178.66666666666666,
                          53.515461867926604]}}
2022-07-26 20:10:49,246 INFO sampler.py:665 -- Info return from env: { '1': { 'agent_1': {},
         'agent_2': {},
         'agent_3': {},
         'agent_5': {},
         'agent_4': {}}}
--- Logging error ---
Traceback (most recent call last):
  File "C:\Users\grhen\AppData\Local\Programs\Python\Python39\lib\site-packages\ray\rllib\env\policy_client.py", line 303, in run
    samples = self.rollout_worker.sample()
  File "C:\Users\grhen\AppData\Local\Programs\Python\Python39\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 825, in sample
    batches = [self.input_reader.next()]
  File "C:\Users\grhen\AppData\Local\Programs\Python\Python39\lib\site-packages\ray\rllib\evaluation\sampler.py", line 115, in next
    batches = [self.get_data()]
  File "C:\Users\grhen\AppData\Local\Programs\Python\Python39\lib\site-packages\ray\rllib\evaluation\sampler.py", line 288, in get_data
    item = next(self._env_runner)
  File "C:\Users\grhen\AppData\Local\Programs\Python\Python39\lib\site-packages\ray\rllib\evaluation\sampler.py", line 671, in _env_runner
    active_envs, to_eval, outputs = _process_observations(
  File "C:\Users\grhen\AppData\Local\Programs\Python\Python39\lib\site-packages\ray\rllib\evaluation\sampler.py", line 893, in _process_observations
    policy_id: PolicyID = episode.policy_for(agent_id)
  File "C:\Users\grhen\AppData\Local\Programs\Python\Python39\lib\site-packages\ray\rllib\evaluation\episode.py", line 175, in policy_for
    raise KeyError(
KeyError: "policy_mapping_fn returned invalid policy id 'default_policy'!"

I can not to figure out how to Multiagents work in RLlib. Anyone can help me? Thanks!

rusu24edward · July 27, 2022, 4:41pm

agent_id and policy_id are not the same thing. It’s a little confusing here because you have named your policies with the same name as your agents. When the client creates the observation dict, that is keyed off agent id, the server receives those observations, but it doesn’t know which agent goes with which policy because your policy_mapping_fn is None, which is why the error says “policy_mapping_fn returned invalid policy id ‘default_policy’!” You don’t have a policy named default_policy.

To fix this, simply change you config so that the policy mapping function maps the agent id to the policy id. In your case, they are named the same thing, so you can simply do this:

policy_mapping_fn: lambda agent_id: agent_id

This issue is not unique to client-server actually. Even if you ran this with the normal tune/trainer setup, you would get the same error.

hermmanhender · July 27, 2022, 5:20pm

Thank you very much. I tried the suggestion and it works. Now the concepts of agent_id and policy_id are clearer to me.

Topic		Replies	Views
External env for multiagents RLlib	4	18	July 9, 2025
Log multi agent rewards from policy_client RLlib	1	361	April 7, 2022
MultiAgentEnv Delayed rewards RLlib	2	37	June 3, 2025
PolicyClient and QMix + MultiAgentEnv? RLlib	1	203	August 17, 2023
Question about Environment/Observation construction RLlib	1	385	June 17, 2021

How to share obsrvations and rewards in Multi-Agent ExternallEnv?

Related topics