The hyperparameters for SAC to solve “CartPole-v0”

yuwenche · February 1, 2022, 3:00am

I follow the following link to set up the hyperparameters for SAC to solve “CartPole-v0”, which is a very easy task. However, the mean rewards always remain about 10. Do I miss something?

github.com

ray-project/ray/blob/master/rllib/tuned_examples/sac/cartpole-sac.yaml

cartpole-sac:
    env: CartPole-v0
    run: SAC
    stop:
        episode_reward_mean: 100
        timesteps_total: 100000
    config:
        # Works for both torch and tf.
        framework: tf
        gamma: 0.95
        no_done_at_end: false
        target_network_update_freq: 32
        tau: 1.0
        # initial_alpha: 0.5
        train_batch_size: 32
        optimization:
            actor_learning_rate: 0.005
            critic_learning_rate: 0.005
            entropy_learning_rate: 0.0001
        # grad_norm_clipping: 40.0

This file has been truncated. show original

the code:

import ray
#import ray.rllib.agents.ppo as ppo
import ray.rllib.agents.sac as sac

ray.init()
config = sac.DEFAULT_CONFIG.copy()
config[“num_gpus”] = 0
#config[“framework”] = “torch”#
config[“framework”] = “tf”
config[“no_done_at_end”] = False
config[“gamma”] = 0.95
config[“target_network_update_freq”] = 32
config[“tau”] = 1.0
config[“train_batch_size”] = 32
config[“optimization”][“actor_learning_rate”] = 0.005
config[“optimization”][“critic_learning_rate”] = 0.005
config[“optimization”][“entropy_learning_rate”] = 0.0001

#trainer = sac.SACTrainer(config=config, env=“MountainCar-v0”)
trainer = sac.SACTrainer(config=config, env=“CartPole-v0”)

for i in range(5000):
result = trainer.train()
if i % 10 == 0:
#checkpoint = trainer.save()
print("i: “, i,” reward: ",result[‘episode_reward_mean’])

gjoliver · February 7, 2022, 1:12am

this should work. I just tested it myself while answering another thread.
actually you can quickly verify this with:

rllib train -f rllib/tuned_examples/sac/cartpole-sac.yaml

sven1977 · February 9, 2022, 10:37am

Hey @yuwenche , I’m with @gjoliver here, this particular test is part of our CI-testing suite, so it’s guaranteed to work (it’s tested on every single PR and we won’t merge stuff into master if these don’t pass).
Could you send us a small, self-sufficient repro script that reproduces your issue with not learning?
Thanks

yuwenche · February 10, 2022, 4:13am

Thanks for your replies.
I have responded in another thread as the following link. Actually, I used two different user names - one with my github, the other for new signup. I am sorry for inconvenience.

https://discuss.ray.io/t/the-hyperparameters-for-sac-to-solve-cartpole-v0/4909

Topic		Replies	Views
the hyperparameters for SAC to solve “CartPole-v0” RLlib	4	773	February 8, 2022
SAC terminates early in the training RLlib	2	275	October 27, 2021
SAC Does not learn simple problem RLlib	0	161	November 8, 2023
How to set configuration in tune for sac algorithm? RLlib	3	1131	January 13, 2021
SAC trainer slows down drastically RLlib	6	670	May 29, 2022

The hyperparameters for SAC to solve “CartPole-v0”

Related topics