Reproducing results from stablebaselines 3

ichbinram · July 9, 2021, 1:06pm

Hello,
I am currently trying to apply RL on a global optimization problem. I was able to apply the single agent ‘soft actor-critic’ method on my custom environment using Stablebaselines 3 library. And now I would like to continue this research by applying multi-agent RL using the RLlib of Ray. I have been trying to replicate the results I got from stablebaselines 3 using RLlib but I have not been able to reproduce the results.

My custom environment has the following parameters:

Single agent environment (Convert it to multi-agent in the future)
continuous action space ranging (0,1) , dim = (4,)
observation space, dim = (16,)
each episode terminates after 1 timestep.

The major problem I am facing is that the actions are always extreme. They are usually very close to 1 or equal to 0 most of the times.

Can someone tell me what I am doing wrong?

from ray import tune
import RlHelper
from ray.rllib.agents.sac import SACTrainer
from ray.rllib.agents.ppo import PPOTrainer

from single_env_rllib import env
import ray.rllib.agents.sac as sac
import ray.rllib.agents.ddpg as ddpg
import ray.rllib.agents.ppo as ppo
import ray
import numpy as np
import matplotlib.pyplot as plt
from ray.rllib.agents.registry import get_trainer_class


filename='/home/ichbinram/Documents/IFN/dataset/dset4.h5'
agent_class, config = get_trainer_class("SAC", return_config=True)


config['env'] = env
config['framework'] = 'torch'
config['lr'] = 0.0003
config['horizon'] = 1
config['normalize_actions'] = True
config['timesteps_per_iteration'] = 200
stop = {'timesteps_total':200000}
log_dir = './trials'

trainer = RlHelper.RlHelper(config=config,save_dir=log_dir)
checkpoint_path, analysis = trainer.train(stop_criteria=stop)
trainer.load(checkpoint_path)
reward,p_max = trainer.test(filename)

plot = np.reshape(reward, (int(len(reward)/51),51))
plot = np.nanmean(plot, axis=0)

plt.plot(p_max,np.transpose(plot))
plt.ylabel('reward (Mbits/J)')
plt.xlabel('p_max (dBW)')
plt.show()

I have also created a custom training class as shown in this github issue.

arturn · July 13, 2021, 10:38pm

Hi ichbinram,
I would like to reproduce your problem.
Could you provide an example without from single_env_rllib import env, but with another environment that yields the same problem?
Cheers

ichbinram · August 6, 2021, 7:33am

Hello arturn,
I was looking for another environment to reproduce the error, but it turns out that after the recent update, the library works fine. I did not go through the commits to find out what changes were made but it is working now. Thank you for your reply!

ps. if there is no way of closing this issue, please consider it closed.

Topic		Replies	Views
MultiAgent training Issues RLlib	1	477	April 9, 2024
Issue with custom environment RLlib	2	779	July 27, 2021
Migrating from StableBaselines3, not able to reproduce results RLlib	1	103	April 14, 2024
Help with ppo config in multiagent env with complex observations Configure Algorithm, Training, Evaluation, Scaling	0	37	April 11, 2025
Questions and Confusion: Getting started with RLlib Configure Algorithm, Training, Evaluation, Scaling	0	46	February 19, 2025

Reproducing results from stablebaselines 3

Related topics