Reproducing results from stablebaselines 3

Hello,
I am currently trying to apply RL on a global optimization problem. I was able to apply the single agent ‘soft actor-critic’ method on my custom environment using Stablebaselines 3 library. And now I would like to continue this research by applying multi-agent RL using the RLlib of Ray. I have been trying to replicate the results I got from stablebaselines 3 using RLlib but I have not been able to reproduce the results.

My custom environment has the following parameters:

  1. Single agent environment (Convert it to multi-agent in the future)
  2. continuous action space ranging (0,1) , dim = (4,)
  3. observation space, dim = (16,)
  4. each episode terminates after 1 timestep.

The major problem I am facing is that the actions are always extreme. They are usually very close to 1 or equal to 0 most of the times.

Can someone tell me what I am doing wrong?

from ray import tune
import RlHelper
from ray.rllib.agents.sac import SACTrainer
from ray.rllib.agents.ppo import PPOTrainer

from single_env_rllib import env
import ray.rllib.agents.sac as sac
import ray.rllib.agents.ddpg as ddpg
import ray.rllib.agents.ppo as ppo
import ray
import numpy as np
import matplotlib.pyplot as plt
from ray.rllib.agents.registry import get_trainer_class


filename='/home/ichbinram/Documents/IFN/dataset/dset4.h5'
agent_class, config = get_trainer_class("SAC", return_config=True)


config['env'] = env
config['framework'] = 'torch'
config['lr'] = 0.0003
config['horizon'] = 1
config['normalize_actions'] = True
config['timesteps_per_iteration'] = 200
stop = {'timesteps_total':200000}
log_dir = './trials'

trainer = RlHelper.RlHelper(config=config,save_dir=log_dir)
checkpoint_path, analysis = trainer.train(stop_criteria=stop)
trainer.load(checkpoint_path)
reward,p_max = trainer.test(filename)

plot = np.reshape(reward, (int(len(reward)/51),51))
plot = np.nanmean(plot, axis=0)

plt.plot(p_max,np.transpose(plot))
plt.ylabel('reward (Mbits/J)')
plt.xlabel('p_max (dBW)')
plt.show()

I have also created a custom training class as shown in this github issue.

Hi ichbinram,
I would like to reproduce your problem.
Could you provide an example without from single_env_rllib import env, but with another environment that yields the same problem?
Cheers

1 Like

Hello arturn,
I was looking for another environment to reproduce the error, but it turns out that after the recent update, the library works fine. I did not go through the commits to find out what changes were made but it is working now. Thank you for your reply!

ps. if there is no way of closing this issue, please consider it closed.