Multi reward optimization

Ofir_Abu · September 27, 2021, 2:22pm

Is there a way to train an agent with multiple actions in action space (tuples action space) such that each action is trained for different loss function?

sven1977 · September 27, 2021, 3:27pm

Seems like a complicated setup. Why don’t you create 2 different environments with different reward functions and then see, which Trainer (e.g. in a tune search) performs better:

config:
  env: MyEnvClass
  env_config:
    reward_function: tune.grid_search(["A", "B"]),

Then:

trial_results = tune.run("PPO", config=config)

Ofir_Abu · September 27, 2021, 3:30pm

Thanks for the quick response!
The reason is the fact that the different rewards don’t represent rewards for the same task,

For example, I would like one reward to be the environmental reward and the other - a different, self computed one, that try to encourage a certain behavior.

gjoliver · September 27, 2021, 5:38pm

Can you combine the rewards linearly or something, so that the agent will be aware of all these tradeoffs?
Are you training multiple branches of your NN with these loss functions separately?

Ofir_Abu · September 27, 2021, 6:08pm

Yes that’s a good idea, but I really take interest of the case where the 2nd kind of actions is inspired only by the second reward, is there a way to define a “loss per policy head”?

rusu24edward · September 28, 2021, 5:48pm

You may be able to break this down as a multiagent setup. You have a single agent interacting with the simulation, and that agent is actually composed of multiple “subagents”. Each subagent is responsible for decisions about some part of the total action space. Each subagent can be mapped to its own policy, and you can control how those policies are trained with RLlib’s policy and algorithm parameters.

If you take this approach, you’ll need to:

Combine the individual actions into a single action used to update the environment. You’re probably already doing this with your setup using a tuple action space.
Determine what each subagent should observe. They probably should all have the same observations.
Determine how each subagent will be rewarded. It seems like you already have an idea of this, it’s just a matter of implementing it.

If you need help working with multi-agent environments, check out Abmarl, which helps users connect multi-agent simulations to RLlib.

Ofir_Abu · September 29, 2021, 1:29pm

Thanks @rusu24edward this is really helpful!

Topic		Replies	Views
Handling multiple rewards to different branches of model RLlib	3	362	September 15, 2021
Scaling rewards depending on action distribution RLlib	2	366	November 3, 2021
Multi-objective RL RLlib	6	942	November 11, 2021
Workflow for Multi-Agent training RLlib	2	373	January 12, 2022
Multi-agent Env with different reward functions for different agents? RLlib	6	405	September 14, 2021

Multi reward optimization

Related topics