SAC terminates early in the training

I am configuring the SAC DRL algorithm with GPU to train my custom environment. The same configuration for the workers and GPU allocation works fine with PPO but it terminates early with SAC.

Any help will be appreciated!

Below is the code and configuration I am using to train agents for my environment.
System and library configuration:
Ray version: 1.4.0
Python Version: 3.7.10
Tensorflow version: 1.14.0
OS: Fedora 34

 #!/usr/bin/env python
# encoding: utf-8
from NR_IES.envs.NR_IES_env import NR_IES_v0
from ray.tune.registry import register_env
import gym
import os
import ray
from ray.rllib.agents.sac.sac import SACTrainer, DEFAULT_CONFIG
import shutil
from ray import tune
import logging

def main (): 
    chkpt_root = "sac_training_26oct/NR_IES"
    shutil.rmtree(chkpt_root, ignore_errors=True, onerror=None)
    ray.init(local_mode=True, include_dashboard= False, logging_level=logging.DEBUG)

    # custom environment registration
    select_env = "NR_IES-v0"
    register_env(select_env, lambda config: NR_IES_v0())

    # create agent and environment configuration
    config = DEFAULT_CONFIG.copy()
    config["log_level"] = "WARN"
    config['num_workers'] = 10
    config['num_gpus'] = 1

    config['num_gpus_per_worker'] = (1-0.0001)/10

    agent = SACTrainer(config, env=select_env)
    
    status = "{:2d} reward {:6.2f}/{:6.2f}/{:6.2f} len {:4.2f} saved {}"
    n_iter = 1000000

    for n in range(n_iter):
        result = agent.train()
        chkpt_file = agent.save(chkpt_root)
        print(status.format(
                n + 1,
                result["episode_reward_min"],
                result["episode_reward_mean"],
                result["episode_reward_max"],
                result["episode_len_mean"],
                chkpt_file
                ))

if __name__ == "__main__":
    main()

https://docs.ray.io/en/latest/rllib-training.html

Hi @Suba_Sah ,

could you try out, if the same happens, if you use tune instead of directly the Trainer? As it is recommended that you run RLlib Trainers with tune.

Specifically, you use tune.run() with the SACTrainer as an input argument and your config as shown here for PPO:

import ray
from ray import tune

ray.init()
tune.run(
    "PPO",
    stop={"episode_reward_mean": 200},
    config={
        "env": "CartPole-v0",
        "num_gpus": 0,
        "num_workers": 1,
        "lr": tune.grid_search([0.01, 0.001, 0.0001]),
    },
)

Hope this helps

Hi @Suba_Sah,

What do you mean by it terminates early?

Are you seeing an error?

Is your for loop stopping before the number of iterations you specified?

Something else?