Malformed "reparameterization trick" in squashed gaussian

grrsausage · March 6, 2023, 6:50pm

How severe does this issue affect your experience of using Ray?

None: Just asking a question out of curiosity

So I’ve been doing a lot of debugging with SAC recently and one thorn that sticks in my side is how RLlib employs the “reparameterization trick” for squashed gaussian.

Everywhere I see the trick mentioned it is defined as:

$a = \tanh ({\mu_\theta + \sigma_\theta\odot\xi})$ where $\xi\thicksim N(0,I)$, then mapping to $[low,high]$

However, it seems like RLlib just does:

$a = \tanh ({a_n})$ where $a_n\thicksim N(\mu_\theta,\sigma_\theta\odot I)$ then mapping to $[low,high]$

Can anyone elucidate how these two are functionally the same? What’s interesting is that even though they are ostensibly not using the “reparameterization trick” it seems like SAC still works with automatic differentiation

arturn · April 13, 2023, 11:54pm

Hi @grrsausage can you file a github issue for this, please?

It would be great if you could provide a script that demonstrates that the SAC loss we provide is “wrong”. Call it and the “correct” implementation with the same inputs and show that they differ.

This might need some discussion.

grrsausage · October 13, 2023, 1:49pm

So sorry for the delay, @arturn ! I will open an issue now

Topic		Replies	Views
Removing Algorithms from RLlib RLlib	10	1170	July 22, 2022
[rllib] Customized action distribution of probability matrices RLlib	1	315	November 9, 2022
Entropy Regularization in PG? RLlib	9	851	September 17, 2022
Export SAC in the Tensorflow format RLlib	0	243	July 7, 2023
Practical advice for RLlib hyperparameter tuning RLlib	1	401	September 12, 2022

Malformed "reparameterization trick" in squashed gaussian

Related topics