Is there a support for training multi-objective based PPO and DQN agent in RLlib? Can I refer to some tuned example if any such thing is feasible?
Hi @vishalrangras,
Depends on what you mean by multi-objective?
If you mean multiple rewards then most approaches will combine them (possible weighted) in the environment and return that single reward. I do not k ow of any examples in rllib that handle multiple rewards for the same agent on a single timestep.
On the other hand, the loss function of virtually every rl algorithm is multi-objective.
Thanks for your response. The approach which you mentioned about is referred to as linearization of a scalar rewards or weighted sum of the rewards in literature I believe. I am currently implementing that in the code of my environment class. However, I have referred to few papers on multi-objective which proposes vectorization of reward and do things differently. Papers are not very clear in terms code implementation part so I am not sure if the same agent is supposed to get multiple different reward for different objective separately or not. And the papers refers to the rewards and not the loss function.
I was hoping to see if RLlib already has some code/algorithm in place which I can refer to, in order to better understand the implementation part of multi-objectivity and apply it to my problem.
I have not seen any such examples in rllib. Do you have a paper reference for the approach that you are interested in?
This is the paper I refer to: [1803.02965] A Multi-Objective Deep Reinforcement Learning Framework
You will see the section of computing different Q values of DQN for different objectives, and those Q Values are then combined for the policy update, if my understanding is correct.
Iām also interested in multi-objective RL.
here is another example paper that comes with a code implementation: [1908.08342] A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation
Thanks for sharing the paper. I will take a look, it should definitely be helpful.