Multi-agent Env with different reward functions for different agents?


I’m interested in creating a multi-agent env with two agents in copies of the same (custom) environment. I’m interested in implementing a different reward for the two agents. Is there an easy way to make this happen?

Also, would this make sense with a centralized critic, or would it mess up the value function for the critic?

In the multi-agent environments API rewards are dict mapping agent names to their rewards as well as observations.

> print(rewards)
{"car_1": 3, "car_2": -1, "traffic_light_1": 0}

You can calculate rewards in your env and return it for the agent concerned


When I use a centralized critic in these cases I include an agent index variable that indicates which agent the reward is from. If the number agents is small I use a one-hot encoding and if they are >4 I use a binary encoding.

00000100 ← 6th agent id one-hot encoding
0110 ← 6th agent id binary encoding

Good luck.

Thank you both, I really appreciate!

@rsv , just a note, in the example you listed, this dict just returns ints for rewards. Would I return a dict in the reward function in order for this to be the output of a function, i.e. dynamic?


The link below shows a simple multi-agent env example. You are going to have a step method in your environment that will return 4 dicts. One for new_obs, reward, agentandenv_done, extra_info respectively.

I appreciate it, thank you. Thanks for pointing me in the right direction.

Your environment returns reward dict in multiagent case. Rllib doesn’t calculate reward, so you can make it by function in the environment and it may be int or float