the hyperparameters for SAC to solve “CartPole-v0”

alanyuwenche · February 3, 2022, 9:01am

I follow the following link to set up the hyperparameters for SAC to solve “CartPole-v0”, which is a very easy task. However, the mean rewards always remain about 10. Do I miss something?

https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/sac/cartpole-sac.yaml

import ray
#import ray.rllib.agents.ppo as ppo
import ray.rllib.agents.sac as sac

ray.init()
config = sac.DEFAULT_CONFIG.copy()
config[“num_gpus”] = 0
#config[“framework”] = “torch”#
config[“framework”] = “tf”
config[“no_done_at_end”] = “false”
config[“gamma”] = 0.95
config[“target_network_update_freq”] = 32
config[“tau”] = 1.0
config[“train_batch_size”] = 32
config[‘optimization’][‘actor_learning_rate’] = 0.005
config[‘optimization’][‘critic_learning_rate’] = 0.005
config[‘optimization’][‘entropy_learning_rate’] = 0.0001

#trainer = sac.SACTrainer(config=config, env=“MountainCar-v0”)
trainer = sac.SACTrainer(config=config, env=“CartPole-v0”)

for i in range(5000):

Perform one iteration of training the policy with PPO

result = trainer.train()

if i % 10 == 0:
#checkpoint = trainer.save()
print("i: “, i,” reward: ",result[‘episode_reward_mean’])

gjoliver · February 7, 2022, 1:05am

Hi I took a look at your script, a few things:

you are passing string “false” to the parameter no_done_at_end. python interprets that as a True value. So the env is outputting episodes without done bit at the end and completely confuses the trainer.
you actually don’t need to init the config dict with DEFAULT_CONFIG yourself. RLlib will do it for you. E.g., the following script works:

import ray
import ray.rllib.agents.sac as sac

ray.init(local_mode=True)

config = {
    'framework': 'tf',
    'gamma': 0.95,
    'no_done_at_end': False,
    'target_network_update_freq': 32,
    'tau': 1.0,
    'train_batch_size': 32,
    'optimization': {
        'actor_learning_rate': 0.005,
        'critic_learning_rate': 0.005,
        'entropy_learning_rate': 0.0001
    }
}

trainer = sac.SACTrainer(config=config, env="CartPole-v0")

for i in range(5000):
    result = trainer.train()
    if i % 10 == 0:
        print("i: ", i, result["timesteps_total"], " reward: ", result["episode_reward_mean"])

you can actually run the yaml file directly using:

rllib train -f rllib/tuned_examples/sac/cartpole-sac.yaml

saving you the error when copying over the configuration.

yuwenche · February 8, 2022, 1:03am

It works. Thanks a lot!
However, I failed to apply the same hyperparameters to “MountainCar-v0”. Are there other important hyperparameters in SAC besides learning rate?

avnishn · February 8, 2022, 1:17am

SAC doesn’t work particularly well when applied to discrete action spaces. I’d suggest using another algorithm such as PPO, which can be used with a categorical policy (has discrete outputs) for learning the MountainCar-v0 problem, or instead using an environment such as MountainCarContinous-v0

yuwenche · February 8, 2022, 3:54am

Thanks for your suggestion.

Topic		Replies	Views
The hyperparameters for SAC to solve “CartPole-v0” RLlib	3	910	February 10, 2022
Struggling tuning Soft Actor Critic RLlib	0	176	October 18, 2023
Attribute error when trying to compute actions after training DreamerV3 on Cartpole RLlib	2	460	December 4, 2023
Errors in Hyperparameter tuning of PPO with Bayesian Optimization (ray.tune) RLlib	2	824	March 18, 2022
SAC Agent 'Forgets' During Training RLlib	5	296	September 13, 2022

the hyperparameters for SAC to solve “CartPole-v0”

Perform one iteration of training the policy with PPO

Related topics