Recommended Way to Use Stop Parameters in algo.train()

kapibarek · February 23, 2024, 11:59pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I am running a derivative of the SimpleCorridor environment to train with DreamerV3:

This is the environment:

class SimpleCorridorReward(gym.Env):

    def __init__(self, config=None):
        config = config or {}
        self.end_pos = config.get("corridor_length", 10)
        self.cur_pos = 0
        self.action_space = Discrete(2)
        self.observation_space = Box(low=0.0, high=999.0, shape=(1,), dtype=np.float32)
        self.observations = None

    def set_corridor_length(self, length):
        self.end_pos = length
        self.observation_space = Box(low=0.0, high=self.end_pos, shape=(1,), dtype=np.float32)
        print("Updated corridor length to {}".format(length))

    def reset(self, *, seed=None, options=None):
        self.cur_pos = 0.0
        self.observations = np.array([self.cur_pos], dtype=np.float32)
        return self.observations, {}

    def step(self, action):
        assert action in [0, 1], action
        if action == 0 and self.cur_pos > 0:
            self.cur_pos -= 1.0
        elif action == 1:
            self.cur_pos += 1.0
        done = truncated = self.cur_pos >= self.end_pos
        reward = 100 if done else -1
        self.observations = np.array([self.cur_pos], dtype=np.float32)
        return self.observations, reward, done, truncated, {}

import gymnasium as gym
from gymnasium.envs.registration import register
from ray import tune
from test_envs import SimpleCorridorReward
from ray.rllib.utils import check_env
from ray.tune.logger import pretty_print
from ray.rllib.algorithms.dreamerv3 import DreamerV3Config

#tune.register_env("SimpleCorridorReward", lambda config: SimpleCorridorReward())

register(
    id='SimpleCorridorReward-v0',
    entry_point='test_envs:SimpleCorridorReward',
)

env = gym.make("SimpleCorridorReward-v0")

stop = {"timesteps_total": 100000}

config = DreamerV3Config()
config = config.environment("SimpleCorridorReward-v0")
config = config.training(model_size="XS", 
                         training_ratio=1, 
                         model = {'batch_size_B': 1, 
                                  'batch_length_T': 1, 
                                  'horizon_H': 1, 
                                  'gamma': 0.997, 
                                  'model_size': 'XS'})

config = config.resources(num_learner_workers=0)

for _ in range(5):
     algo = config.build()
     result = algo.train()
     print(pretty_print(result))

I can set the training iterations as a loop but can not use the stop value to end training with e.g. results = tune.run("DreamerV3", config=config, stop=stop, verbose=0) which returns:

TuneError: ('Trials did not complete', [DreamerV3_SimpleCorridorReward-v0_bbf54_00000])

Long story short, what is the recommended way to set a stopping point for algo.train() that is part of the config?
The examples I see in the documentation are all related to tune.run, I just want to train the agent not do any hyperparameter tuning.

MichaelXCC · February 25, 2024, 4:25am

This error seems to indicate problems with other parts of the training and not the stop configuration. Do you have the full logs?

kapibarek · February 26, 2024, 11:36am

Hey, it was related to ray.tune.register_env, i fixed it. This is a follow-up question: what is the recommended way to evaluate the custom environment during training in Dreamer-V3? e.g. train for 10000 steps and evaluate every 1000 step

from ray.rllib.algorithms.dreamerv3 import DreamerV3Config

config = DreamerV3Config()
config = config.environment("SimpleCorridorReward-v0")
config = config.training(model_size="XS", 
                         training_ratio=1, 
                         model = {'batch_size_B': 1, 
                                  'batch_length_T': 1, 
                                  'horizon_H': 1, 
                                  'gamma': 0.997, 
                                  'model_size': 'XS'})

config = config.resources(num_learner_workers=0)
config = config.evaluation(evaluation_interval=1)

algo = config.build()
for _ in range(10):
     result = algo.train()
     print(result['training_iteration'], result['timesteps_total'])

This returns an error:
ValueError: When using an EnvRunner class that's not a subclass of RolloutWorker(yours is DreamerV3EnvRunner), config.enable_async_evaluation must be set to True! Callconfig.evaluation(enable_async_evaluation=True) on your config object to fix this problem.`

Can you tell me how one is supposed to set up training and evaluation in Dreamer-V3?

kapibarek · March 3, 2024, 9:35pm

What is the use of this online “community” if no one is going to “communicate” the recommended ways to run code in Ray? This space is supposed to be helpful for the learning curve among other things, no?

Lars_Simon_Zehnder · March 4, 2024, 2:06pm

@kapibarek I am sorry to hear that you are running into problems when using the library. Happy to see that @MichaelXCC could help you fix the first of the errors.

If you run your algorithm not with tune it’s in your responsibility to stop the algorithm inside of your loop by using either a fixed number of iterations in your loop or a stop signal coming from a variable inside of the result dictionary (e.g. episode_mean_reward) returned by the train method.

In regard to evaluation the error message makes it clear: in the evaluation set enable_async_evaluation to True. You might also need to increase the number of num_evaluation_workers to at least 1 then.

For further evaluation options take a look into the docs, specifically under evaluation_interval and evaluation_duration.

kapibarek · March 4, 2024, 2:50pm

Hey, thanks, I figured out through trial-and-error, the learning curve for RayRL is a bit steep, your reply has been very helpful!

Topic		Replies	Views
Issue with Running Experiments with Custom Gym Environment RLlib	4	386	June 13, 2022
Trainer is calling reset() even if the trial should have stop RLlib	2	233	May 30, 2022
Error while running RLLib training with tune RLlib	3	332	August 30, 2021
Not able to locate rllib train function code RLlib	6	223	March 22, 2023
Env precheck inconsistent with Trainer RLlib	10	912	June 6, 2022

Recommended Way to Use Stop Parameters in algo.train()

Related Topics