I have a model which has rewards for two different objectives, I want to know if there is a way to train a policy in rllib that could achieve a trade-off between these two objectives (finding pareto front maybe)? Is there any multi-objective rl algorithm implementation in the rllib? If not, how can I add a multiobjective algorithm suitable for my environment to the rllib?
Hey @saeid93 , great question. There is no algorithm in RLlib that was designed especially for that purpose (multi-reward envs). You could add a post-processing step to merge the rewards into one by providing a custom callback and overriding the
on_postprocess_trajectory method. In there, you can change the incoming batch’s “rewards” key.
See here for an example:
Thanks, @sven1977 . That’s a nice way of doing it! Could you please confirm that in your solution you mean ‘summing up’ both rewards by ‘merge the reward into one’? That’s a nice solution but currently, I am doing the same thing (summing up) the rewards in my environment and passing that to ray. Then I pass each reward separately in my info dict separately as a custom metric to the monitoring callback for viewing them in tensorboard. Does your solution make any difference to what I am doing right now since as far as I understand both of them are summing up both objectives rewards?
Summing up the rewards works to some extent but the problem with that is the rewards come from different rages and even normalizing them will not result in a smooth curve. If I want to move beyond summing up is there any particular RL solution that you suggest for integrating (implementing) into rllib?
Have a look at this related post.