Hi, I’m new to RLlib and not familiar to english.
I want to know whether DDPG algorithm minimizes or maximize return because when I train my agent, it learns exactly opposite direction.
At first, I set my reward as,
to make x zero. (which means maximize return)
But result is as top of the below picture
So, I think DDPG in RLlib minimize return.
Thus I set the reward as below
But this time, result is as bottom of the below picture
It learn to maximize return.
I confuse whether DDPG maximize or minimize return.
thanks to spend your time to this topic.