Multi-objective RL

vishalrangras · November 7, 2021, 11:27pm

Is there a support for training multi-objective based PPO and DQN agent in RLlib? Can I refer to some tuned example if any such thing is feasible?

mannyv · November 8, 2021, 11:07am

Hi @vishalrangras,

Depends on what you mean by multi-objective?

If you mean multiple rewards then most approaches will combine them (possible weighted) in the environment and return that single reward. I do not k ow of any examples in rllib that handle multiple rewards for the same agent on a single timestep.

On the other hand, the loss function of virtually every rl algorithm is multi-objective.

github.com

ray-project/ray/blob/547bfbc4a4072a7fe3afae3730db2a822715ca87/rllib/agents/ppo/ppo_torch_policy.py#L103

    
      
                  value_fn_out - prev_value_fn_out, -policy.config["vf_clip_param"],
                  policy.config["vf_clip_param"])
              vf_loss2 = torch.pow(
                  vf_clipped - train_batch[Postprocessing.VALUE_TARGETS], 2.0)
              vf_loss = torch.max(vf_loss1, vf_loss2)
              mean_vf_loss = reduce_mean_valid(vf_loss)
          # Ignore the value function.
          else:
              vf_loss = mean_vf_loss = 0.0
          
          
total_loss = reduce_mean_valid(-surrogate_loss +
                                         policy.kl_coeff * action_kl +
                                         policy.config["vf_loss_coeff"] * vf_loss -
                                         policy.entropy_coeff * curr_entropy)
          
          
# Store values for stats function in model (tower), such that for
          # multi-GPU, we do not override them during the parallel loss phase.
          model.tower_stats["total_loss"] = total_loss
          model.tower_stats["mean_policy_loss"] = mean_policy_loss
          model.tower_stats["mean_vf_loss"] = mean_vf_loss
          model.tower_stats["vf_explained_var"] = explained_variance(

vishalrangras · November 9, 2021, 11:14am

Thanks for your response. The approach which you mentioned about is referred to as linearization of a scalar rewards or weighted sum of the rewards in literature I believe. I am currently implementing that in the code of my environment class. However, I have referred to few papers on multi-objective which proposes vectorization of reward and do things differently. Papers are not very clear in terms code implementation part so I am not sure if the same agent is supposed to get multiple different reward for different objective separately or not. And the papers refers to the rewards and not the loss function.

I was hoping to see if RLlib already has some code/algorithm in place which I can refer to, in order to better understand the implementation part of multi-objectivity and apply it to my problem.

mannyv · November 9, 2021, 4:11pm

@vishalrangras

I have not seen any such examples in rllib. Do you have a paper reference for the approach that you are interested in?

vishalrangras · November 9, 2021, 5:06pm

This is the paper I refer to: [1803.02965] A Multi-Objective Deep Reinforcement Learning Framework

You will see the section of computing different Q values of DQN for different objectives, and those Q Values are then combined for the policy update, if my understanding is correct.

dbk80 · November 11, 2021, 9:49am

I’m also interested in multi-objective RL.
here is another example paper that comes with a code implementation: [1908.08342] A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation

vishalrangras · November 11, 2021, 6:05pm

Thanks for sharing the paper. I will take a look, it should definitely be helpful.

Topic		Replies	Views
How can I train a multi-objective model in rllib? RLlib	3	1055	November 17, 2021
Multi reward optimization RLlib	6	440	September 29, 2021
Workflow for Multi-Agent training RLlib	2	393	January 12, 2022
[RLlib] Multi-headed DQN RLlib	5	1345	June 13, 2021
Proper implement of reward scaling in PPO RLlib	0	345	December 17, 2020

Multi-objective RL

Related topics