Multi-agent Env with different reward functions for different agents?

lucas_spangher · September 10, 2021, 11:56pm

Hi,

I’m interested in creating a multi-agent env with two agents in copies of the same (custom) environment. I’m interested in implementing a different reward for the two agents. Is there an easy way to make this happen?

Also, would this make sense with a centralized critic, or would it mess up the value function for the critic?

rsv · September 12, 2021, 7:03am

In the multi-agent environments API rewards are dict mapping agent names to their rewards as well as observations.

> print(rewards)
{"car_1": 3, "car_2": -1, "traffic_light_1": 0}

You can calculate rewards in your env and return it for the agent concerned

mannyv · September 12, 2021, 11:35am

@lucas_spangher,

When I use a centralized critic in these cases I include an agent index variable that indicates which agent the reward is from. If the number agents is small I use a one-hot encoding and if they are >4 I use a binary encoding.

00000100 ← 6th agent id one-hot encoding
0110 ← 6th agent id binary encoding

Good luck.

lucas_spangher · September 14, 2021, 12:04am

Thank you both, I really appreciate!

@rsv , just a note, in the example you listed, this dict just returns ints for rewards. Would I return a dict in the reward function in order for this to be the output of a function, i.e. dynamic?

mannyv · September 14, 2021, 12:25am

@lucas_spangher

The link below shows a simple multi-agent env example. You are going to have a step method in your environment that will return 4 dicts. One for new_obs, reward, agentandenv_done, extra_info respectively.

github.com

ray-project/ray/blob/master/rllib/examples/env/multi_agent.py

import gym
import random

from ray.rllib.env.multi_agent_env import MultiAgentEnv, make_multi_agent
from ray.rllib.examples.env.mock_env import MockEnv, MockEnv2
from ray.rllib.examples.env.stateless_cartpole import StatelessCartPole
from ray.rllib.utils.annotations import Deprecated


@Deprecated(
    old="ray.rllib.examples.env.multi_agent.make_multiagent",
    new="ray.rllib.env.multi_agent_env.make_multi_agent",
    error=False)
def make_multiagent(env_name_or_creator):
    return make_multi_agent(env_name_or_creator)


class BasicMultiAgent(MultiAgentEnv):
    """Env of N independent agents, each of which exits after 25 steps."""

This file has been truncated. show original

lucas_spangher · September 14, 2021, 12:38am

I appreciate it, thank you. Thanks for pointing me in the right direction.

rsv · September 14, 2021, 7:03am

Your environment returns reward dict in multiagent case. Rllib doesn’t calculate reward, so you can make it by function in the environment and it may be int or float

Topic		Replies	Views
Share rewards in a cooperative multiagent environment RLlib	7	436	September 9, 2022
Multi-Agent System for maximizing the overall reward of all agents? RLlib	1	761	April 10, 2021
MultiAgentEnv Delayed rewards RLlib	2	37	June 3, 2025
How to separate rewards between agent in adversarial multi agent env RLlib	3	480	August 16, 2022
Multi reward optimization RLlib	6	405	September 29, 2021

Multi-agent Env with different reward functions for different agents?

Related topics