SAC terminates early in the training

Suba_Sah · October 26, 2021, 11:24pm

I am configuring the SAC DRL algorithm with GPU to train my custom environment. The same configuration for the workers and GPU allocation works fine with PPO but it terminates early with SAC.

Any help will be appreciated!

Below is the code and configuration I am using to train agents for my environment.
System and library configuration:
Ray version: 1.4.0
Python Version: 3.7.10
Tensorflow version: 1.14.0
OS: Fedora 34

 #!/usr/bin/env python
# encoding: utf-8
from NR_IES.envs.NR_IES_env import NR_IES_v0
from ray.tune.registry import register_env
import gym
import os
import ray
from ray.rllib.agents.sac.sac import SACTrainer, DEFAULT_CONFIG
import shutil
from ray import tune
import logging

def main (): 
    chkpt_root = "sac_training_26oct/NR_IES"
    shutil.rmtree(chkpt_root, ignore_errors=True, onerror=None)
    ray.init(local_mode=True, include_dashboard= False, logging_level=logging.DEBUG)

    # custom environment registration
    select_env = "NR_IES-v0"
    register_env(select_env, lambda config: NR_IES_v0())

    # create agent and environment configuration
    config = DEFAULT_CONFIG.copy()
    config["log_level"] = "WARN"
    config['num_workers'] = 10
    config['num_gpus'] = 1

    config['num_gpus_per_worker'] = (1-0.0001)/10

    agent = SACTrainer(config, env=select_env)
    
    status = "{:2d} reward {:6.2f}/{:6.2f}/{:6.2f} len {:4.2f} saved {}"
    n_iter = 1000000

    for n in range(n_iter):
        result = agent.train()
        chkpt_file = agent.save(chkpt_root)
        print(status.format(
                n + 1,
                result["episode_reward_min"],
                result["episode_reward_mean"],
                result["episode_reward_max"],
                result["episode_len_mean"],
                chkpt_file
                ))

if __name__ == "__main__":
    main()

https://docs.ray.io/en/latest/rllib-training.html

Lars_Simon_Zehnder · October 27, 2021, 9:19am

Hi @Suba_Sah ,

could you try out, if the same happens, if you use tune instead of directly the Trainer? As it is recommended that you run RLlib Trainers with tune.

Specifically, you use tune.run() with the SACTrainer as an input argument and your config as shown here for PPO:

import ray
from ray import tune

ray.init()
tune.run(
    "PPO",
    stop={"episode_reward_mean": 200},
    config={
        "env": "CartPole-v0",
        "num_gpus": 0,
        "num_workers": 1,
        "lr": tune.grid_search([0.01, 0.001, 0.0001]),
    },
)

Hope this helps

mannyv · October 27, 2021, 11:45pm

Hi @Suba_Sah,

What do you mean by it terminates early?

Are you seeing an error?

Is your for loop stopping before the number of iterations you specified?

Something else?

Topic		Replies	Views
SAC trainer slows down drastically RLlib	6	670	May 29, 2022
Unexpected KeyError while training SAC Configure Algorithm, Training, Evaluation, Scaling	0	26	June 30, 2025
Can't get Ray to use my GPU RLlib	5	3179	May 17, 2022
the hyperparameters for SAC to solve “CartPole-v0” RLlib	4	775	February 8, 2022
Training with pre-trained actor and critic using SAC is too slow Configure Algorithm, Training, Evaluation, Scaling	0	344	June 29, 2023

SAC terminates early in the training

Related topics