Multi reward optimization

Is there a way to train an agent with multiple actions in action space (tuples action space) such that each action is trained for different loss function?

Seems like a complicated setup. Why don’t you create 2 different environments with different reward functions and then see, which Trainer (e.g. in a tune search) performs better:

  env: MyEnvClass
    reward_function: tune.grid_search(["A", "B"]),


trial_results ="PPO", config=config)

Thanks for the quick response!
The reason is the fact that the different rewards don’t represent rewards for the same task,

For example, I would like one reward to be the environmental reward and the other - a different, self computed one, that try to encourage a certain behavior.

Can you combine the rewards linearly or something, so that the agent will be aware of all these tradeoffs?
Are you training multiple branches of your NN with these loss functions separately?

Yes that’s a good idea, but I really take interest of the case where the 2nd kind of actions is inspired only by the second reward, is there a way to define a “loss per policy head”?

You may be able to break this down as a multiagent setup. You have a single agent interacting with the simulation, and that agent is actually composed of multiple “subagents”. Each subagent is responsible for decisions about some part of the total action space. Each subagent can be mapped to its own policy, and you can control how those policies are trained with RLlib’s policy and algorithm parameters.

If you take this approach, you’ll need to:

  1. Combine the individual actions into a single action used to update the environment. You’re probably already doing this with your setup using a tuple action space.
  2. Determine what each subagent should observe. They probably should all have the same observations.
  3. Determine how each subagent will be rewarded. It seems like you already have an idea of this, it’s just a matter of implementing it.

If you need help working with multi-agent environments, check out Abmarl, which helps users connect multi-agent simulations to RLlib.

Thanks @rusu24edward this is really helpful!