How can I train a multi-objective model in rllib?

saeid93 · November 16, 2021, 11:57am

Hello,
I have a model which has rewards for two different objectives, I want to know if there is a way to train a policy in rllib that could achieve a trade-off between these two objectives (finding pareto front maybe)? Is there any multi-objective rl algorithm implementation in the rllib? If not, how can I add a multiobjective algorithm suitable for my environment to the rllib?

sven1977 · November 16, 2021, 2:22pm

Hey @saeid93 , great question. There is no algorithm in RLlib that was designed especially for that purpose (multi-reward envs). You could add a post-processing step to merge the rewards into one by providing a custom callback and overriding the on_postprocess_trajectory method. In there, you can change the incoming batch’s “rewards” key.

See here for an example:

github.com

ray-project/ray/blob/master/rllib/examples/custom_metrics_and_callbacks.py#L86

    
      
                  trainer, result["episodes_this_iter"]))
              # you can mutate the result dict to add new fields to return
              result["callback_ok"] = True
          
          
def on_learn_on_batch(self, *, policy: Policy, train_batch: SampleBatch,
                                result: dict, **kwargs) -> None:
              result["sum_actions_in_train_batch"] = np.sum(train_batch["actions"])
              print("policy.learn_on_batch() result: {} -> sum actions: {}".format(
                  policy, result["sum_actions_in_train_batch"]))
          
          
def on_postprocess_trajectory(
                  self, *, worker: RolloutWorker, episode: Episode, agent_id: str,
                  policy_id: str, policies: Dict[str, Policy],
                  postprocessed_batch: SampleBatch,
                  original_batches: Dict[str, SampleBatch], **kwargs):
              print("postprocessed {} steps".format(postprocessed_batch.count))
              if "num_batches" not in episode.custom_metrics:
                  episode.custom_metrics["num_batches"] = 0
              episode.custom_metrics["num_batches"] += 1

saeid93 · November 17, 2021, 12:33pm

Thanks, @sven1977 . That’s a nice way of doing it! Could you please confirm that in your solution you mean ‘summing up’ both rewards by ‘merge the reward into one’? That’s a nice solution but currently, I am doing the same thing (summing up) the rewards in my environment and passing that to ray. Then I pass each reward separately in my info dict separately as a custom metric to the monitoring callback for viewing them in tensorboard. Does your solution make any difference to what I am doing right now since as far as I understand both of them are summing up both objectives rewards?
Summing up the rewards works to some extent but the problem with that is the rewards come from different rages and even normalizing them will not result in a smooth curve. If I want to move beyond summing up is there any particular RL solution that you suggest for integrating (implementing) into rllib?

mannyv · November 17, 2021, 1:42pm

Hi @saeid93,

Have a look at this related post.

Topic		Replies	Views
Multi-objective RL RLlib	6	1037	November 11, 2021
Multi reward optimization RLlib	6	422	September 29, 2021
[rllib] Modify multi agent env reward mid training RLlib	7	1345	May 27, 2021
Wrapping existing models in rllib RLlib	0	410	March 15, 2021
Handling multiple rewards to different branches of model RLlib	3	374	September 15, 2021

How can I train a multi-objective model in rllib?

Related topics