I am trying to build a trainable class for a Stock Market trading environment, where the agent will train at a particular duration of time, then validate with the validation environment. As the default RLlib Algorithm when passed like a string to the trainable class does not do this out of the box, hence I am trying to build a custom Trainable Class (If there is a way and I am missing something, please let me know).
My custom trainable class looks like the following
from ray.tune.logger import pretty_print
from ray.rllib.algorithms.ppo import PPOConfig
class MyTrainableClass(tune.Trainable):
def setup(self, config:dict):
self.train_iters = config["training_iterations"]
self.algo = ppo.PPO(config=config)
def step(self):
for i in range(self.train_iters):
results = self.algo.train()
sharpe_ratio = validation_func(self.algo,validation_env)
session.report({"sharpe":sharpe_ratio})
return results
def save_checkpoint(self, tmp_checkpoint_dir):
checkpoint_path = os.path.join(tmp_checkpoint_dir, "model.pth")
self.algo.save_checkpoint(checkpoint_path)
return tmp_checkpoint_dir
def load_checkpoint(self, tmp_checkpoint_dir):
checkpoint_path = os.path.join(tmp_checkpoint_dir, "model.pth")
self.algo = self.algo.restore(checkpoint_path)
But in Ray 2.3.0, we are expected to port our environment from the old gym style to Gymnasium style. After doing it, I am getting this error
2023-03-20 02:15:22,685 INFO worker.py:1544 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
(RolloutWorker pid=1663) 2023-03-20 02:15:28,133 WARNING env.py:156 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(RolloutWorker pid=1663) 2023-03-20 02:15:28,133 WARNING env.py:166 -- Your env reset() method appears to take 'seed' or 'return_info' arguments. Note that these are not yet supported in RLlib. Seeding will take place using 'env.seed()' and the info dict will not be returned from reset.
This environment worked well when I used the default trainable RLlib algorithms like trainable=“PPO”, but it is failing for it. Even if I changed the environment to the earlier style, it is complaining again that my environment checker is failing (also I did disable_env_checking=True, still it is failing).
So is there a better way to do this? Training followed by validation. And am I missing something in the above code?
python==3.10.6
Ray==2.3.0
Ubuntu 22.04