Struggling tuning Soft Actor Critic

sergioval · October 18, 2023, 4:00pm

How severe does this issue affect your experience of using Ray?

Low: It annoys or frustrates me for a moment.

Hello, I am trying to use SAC with a custom toy environment. PPO solves it very efficiently and robustly, achieving optimal performance right away. However, I am struggling tuning SAC.

SAC achieves optimal episode_reward_max right away; but the episode_reward_min is low, about 30% of the optimal, and it never improves. The mean is 80% of the optimal, meaning that not too many episodes are contributing to the low min reward. This happens for both training and evaluation.

This doesn’t happen for PPO, which obtains optimal values for episode_reward_min and episode_reward_max efficiently.

I have tried tuning SAC, but I am not able to get rid of the rogue episodes and overcome the low min reward. The best performance is obtained for a low initial alpha of 0.1.

Is this expected behaviour? Could it be due to some stochasticity during evaluation that I should make deterministic?

Thanks!

Topic		Replies	Views
SAC Agent 'Forgets' During Training RLlib	5	297	September 13, 2022
the hyperparameters for SAC to solve “CartPole-v0” RLlib	4	766	February 8, 2022
The hyperparameters for SAC to solve “CartPole-v0” RLlib	3	912	February 10, 2022
SAC trainer slows down drastically RLlib	6	670	May 29, 2022
Unexpected dramatic drop in reward RLlib	8	963	November 13, 2023

Struggling tuning Soft Actor Critic

Related topics