SAC Agent 'Forgets' During Training

Hey guys,

I’m currently training an SAC agent (4D continuous action space [-1,1] , 19 D obs space [unbound]) which is able to create successful episodes fairly quickly but then it starts to perform poorly after awhile (See graph bellow)

  • horizontal axis is the episode number

For this agent I’m using the default SAC config
Trained for 155005 steps

What is the best course of action to take when you have an agent exhibiting this behavior ?

Ray version: 2.0
Python: 3.9

Hi @Stale_neutrino,

  • What parameters have you tuned? What came out of that?
  • tau?
  • target_network_update_freq?
  • LR schedule?

@avnishn can we get some tuning advice from you?

I’d need to see more data here other than reward curves, so I’ll give some general advice.

Online RL fails to learn when agents don’t explore the optimal behavior. So with that said, we can look at the hparams that control exploration.

In RLlib they are called:

and then some other ones that would be nice to look at are mean_q, target_entropy, actor_loss, critic_loss

Also, can you use tensorboard dev and share your tensorboard logs?

@avnishn - Upload and Share ML Experiments for Free

Hey @arturn,

Left all params as default, that being said I should probably run a tune session for this agent. Besides the params you listed which other should I tune and what should their ranges ?


Hi @Stale_neutrino ,

Have a look at our tuned examples section in the repo to find some examples of what parameters we modified in the past and also to find out a good starting point for a hopefully similar problem.


1 Like