SAC Agent 'Forgets' During Training

Stale_neutrino · September 13, 2022, 3:09pm

Hey guys,

I’m currently training an SAC agent (4D continuous action space [-1,1] , 19 D obs space [unbound]) which is able to create successful episodes fairly quickly but then it starts to perform poorly after awhile (See graph bellow)

horizontal axis is the episode number

For this agent I’m using the default SAC config
Trained for 155005 steps

What is the best course of action to take when you have an agent exhibiting this behavior ?

Ray version: 2.0
Python: 3.9

arturn · September 13, 2022, 5:20pm

Hi @Stale_neutrino,

What parameters have you tuned? What came out of that?
tau?
target_network_update_freq?
LR schedule?

@avnishn can we get some tuning advice from you?

avnishn · September 13, 2022, 6:12pm

I’d need to see more data here other than reward curves, so I’ll give some general advice.

Online RL fails to learn when agents don’t explore the optimal behavior. So with that said, we can look at the hparams that control exploration.

In RLlib they are called:
log_alpha_value
alpha_value

and then some other ones that would be nice to look at are mean_q, target_entropy, actor_loss, critic_loss

Also, can you use tensorboard dev and share your tensorboard logs?

Stale_neutrino · September 13, 2022, 6:45pm

@avnishn TensorBoard.dev - Upload and Share ML Experiments for Free

Stale_neutrino · September 13, 2022, 6:47pm

Hey @arturn,

Left all params as default, that being said I should probably run a tune session for this agent. Besides the params you listed which other should I tune and what should their ranges ?

Thanks

arturn · September 13, 2022, 7:09pm

Hi @Stale_neutrino ,

Have a look at our tuned examples section in the repo to find some examples of what parameters we modified in the past and also to find out a good starting point for a hopefully similar problem.

Cheers

Topic		Replies	Views
SAC Training Performance Detirioration RLlib	3	294	July 5, 2022
SAC trainer slows down drastically RLlib	6	670	May 29, 2022
The hyperparameters for SAC to solve “CartPole-v0” RLlib	3	913	February 10, 2022
Struggling tuning Soft Actor Critic RLlib	0	182	October 18, 2023
the hyperparameters for SAC to solve “CartPole-v0” RLlib	4	773	February 8, 2022

SAC Agent 'Forgets' During Training

Related topics