# Dose DDPG algorithm in RLlib minimize or maximize return?

Hi, I’m new to RLlib and not familiar to english.

I want to know whether DDPG algorithm minimizes or maximize return because when I train my agent, it learns exactly opposite direction.

At first, I set my reward as,
`exp(-x^2)`
to make x zero. (which means maximize return)
But result is as top of the below picture

So, I think DDPG in RLlib minimize return.
Thus I set the reward as below
`-exp(-x^2)`
But this time, result is as bottom of the below picture

It learn to maximize return.

I confuse whether DDPG maximize or minimize return.

thanks to spend your time to this topic.