Malformed "reparameterization trick" in squashed gaussian

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity

So I’ve been doing a lot of debugging with SAC recently and one thorn that sticks in my side is how RLlib employs the “reparameterization trick” for squashed gaussian.

Everywhere I see the trick mentioned it is defined as:

$a = \tanh ({\mu_\theta + \sigma_\theta\odot\xi})$ where $\xi\thicksim N(0,I)$, then mapping to $[low,high]$

However, it seems like RLlib just does:

$a = \tanh ({a_n})$ where $a_n\thicksim N(\mu_\theta,\sigma_\theta\odot I)$ then mapping to $[low,high]$

Can anyone elucidate how these two are functionally the same? What’s interesting is that even though they are ostensibly not using the “reparameterization trick” it seems like SAC still works with automatic differentiation