Dose DDPG algorithm in RLlib minimize or maximize return?

user1 · October 12, 2021, 3:39am

Hi, I’m new to RLlib and not familiar to english.

I want to know whether DDPG algorithm minimizes or maximize return because when I train my agent, it learns exactly opposite direction.

At first, I set my reward as,
exp(-x^2)
to make x zero. (which means maximize return)
But result is as top of the below picture

So, I think DDPG in RLlib minimize return.
Thus I set the reward as below
-exp(-x^2)
But this time, result is as bottom of the below picture

It learn to maximize return.

I confuse whether DDPG maximize or minimize return.

thanks to spend your time to this topic.

Topic		Replies	Views
Not Sure Which RLlib Algorithm To Use RLlib	5	640	April 27, 2021
The role of the discount factor gamma in policy gradient algorithms RLlib	2	519	September 30, 2021
Financial market making using RLLib	0	243	October 13, 2023
RLlib DQN Trainer Evaluate Function Help RLlib	1	314	August 22, 2022
Error with EpsilonGreedy option for PPO & DDPG but works in PPO RLlib	0	297	July 8, 2021