The hyperparameters for SAC to solve “CartPole-v0”

I follow the following link to set up the hyperparameters for SAC to solve “CartPole-v0”, which is a very easy task. However, the mean rewards always remain about 10. Do I miss something?

the code:

import ray
#import ray.rllib.agents.ppo as ppo
import ray.rllib.agents.sac as sac

config = sac.DEFAULT_CONFIG.copy()
config[“num_gpus”] = 0
#config[“framework”] = “torch”#
config[“framework”] = “tf”
config[“no_done_at_end”] = False
config[“gamma”] = 0.95
config[“target_network_update_freq”] = 32
config[“tau”] = 1.0
config[“train_batch_size”] = 32
config[“optimization”][“actor_learning_rate”] = 0.005
config[“optimization”][“critic_learning_rate”] = 0.005
config[“optimization”][“entropy_learning_rate”] = 0.0001

#trainer = sac.SACTrainer(config=config, env=“MountainCar-v0”)
trainer = sac.SACTrainer(config=config, env=“CartPole-v0”)

for i in range(5000):
result = trainer.train()
if i % 10 == 0:
#checkpoint =
print("i: “, i,” reward: ",result[‘episode_reward_mean’])

this should work. I just tested it myself while answering another thread.
actually you can quickly verify this with:

rllib train -f rllib/tuned_examples/sac/cartpole-sac.yaml 
1 Like

Hey @yuwenche , I’m with @gjoliver here, this particular test is part of our CI-testing suite, so it’s guaranteed to work (it’s tested on every single PR and we won’t merge stuff into master if these don’t pass).
Could you send us a small, self-sufficient repro script that reproduces your issue with not learning?

Thanks for your replies.
I have responded in another thread as the following link. Actually, I used two different user names - one with my github, the other for new signup. I am sorry for inconvenience.