Hi, I’m new to RLlib and not familiar to english.

I want to know whether DDPG algorithm minimizes or maximize return because when I train my agent, it learns exactly opposite direction.

At first, I set my reward as,

`exp(-x^2)`

to make x zero. (which means maximize return)

But result is as top of the below picture

So, I think DDPG in RLlib minimize return.

Thus I set the reward as below

`-exp(-x^2)`

But this time, result is as bottom of the below picture

It learn to maximize return.

I confuse whether DDPG maximize or minimize return.

thanks to spend your time to this topic.