Have one iteration of algo.train() shut down after one episode

sunghyun.chung · March 7, 2025, 2:59pm

1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
[O] Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.

2. Environment:

Ray version: 2.43
Python version: 3.12.9
OS: Windows 11
Cloud/Infrastructure:
Other libs/tools (if relevant): torch 2.5, np 2.2

3. What happened vs. what you expected:

Expected: algo.train() method to terminate when an episode is terminated.
Actual: After the episode is terminated, resets and starts collect again until the total step size hits 1000.

This is my current code:

my_dqn_config = (
    DQNConfig()
    .environment(
        env='my_env,
        env_config=my_config,
    )
    .training(
        replay_buffer_config=replay_buffer_config,
        # train_batch_size_per_learner=32,
        # num_epochs=3,
        # shuffle_batch_per_epoch=True,
        model={
            "fcnet_hiddens": [256, 256]
            },
        num_steps_sampled_before_learning_starts=100
    )
    .learners(
        # TODO(@chungs4): This will fail without setting an environment
        # ray.init(runtime_env={"env_vars": {"USE_LIBUV": "0"}}) due to changes in Pytorch >= 2.4.0
        num_learners=1,
        # TODO(@chungs4): GPU Implementation with NCCL
        # num_gpus_per_learner=1
    )
    .env_runners(
        num_env_runners=1,
        batch_mode="complete_episodes"
    )
)

The training starts after collecting around 100 samples (due to num_steps_sampled_before_learning_starts=100) and the batch is cut off by each episode (due to batch_mode=“complete_episodes”). However, I the training keeps running even though I want train() method to shut down after one episode.

christina · March 8, 2025, 2:40am

Hi there and welcome to the Ray community!

Can you please post the stop_config that you are using with your training? There is an example of it here being used in RLllib : Replay Buffers — Ray 2.43.0

Thank you!!

mannyv · March 10, 2025, 2:12pm

Hi @sunghyun.chung,

I think you are going to have a hard time getting it to stop after exactly one episode unless your episode has a fixed length.

Here is the logic for how rllib decides when to stop one training iteration.

github.com/ray-project/ray

rllib/algorithms/algorithm.py

74f53dfc7


      
                      self.algo._counters[NUM_ENV_STEPS_TRAINED]
                      - self.init_env_steps_trained
                  )
          
          min_t = self.algo.config.min_time_s_per_iteration
          min_sample_ts = self.algo.config.min_sample_timesteps_per_iteration
          min_train_ts = self.algo.config.min_train_timesteps_per_iteration
          # Repeat if not enough time has passed or if not enough
          # env|train timesteps have been processed (or these min
          # values are not provided by the user).
          if (
              (not min_t or time.time() - self.time_start >= min_t)
              and (not min_sample_ts or self.sampled >= min_sample_ts)
              and (not min_train_ts or self.trained >= min_train_ts)
          ):
              return True
          else:
              return False

You could create a custom DQN algorithm and overload the should_stop method.

The reason your are seeing 1000 timesteps is because of DQNs default min_sample_timesteps_per_iteration:

github.com/ray-project/ray

rllib/algorithms/dqn/dqn.py

74f53dfc7


      
          # global_norm, no matter the value of `grad_clip_by`.
          self.grad_clip_by = "global_norm"
          self.lr = 5e-4
          self.train_batch_size = 32
          
          # `evaluation()`
          self.evaluation(evaluation_config=AlgorithmConfig.overrides(explore=False))
          
          # `reporting()`
          self.min_time_s_per_iteration = None
          self.min_sample_timesteps_per_iteration = 1000
          
          # DQN specific config settings.
          # fmt: off
          # __sphinx_doc_begin__
          self.target_network_update_freq = 500
          self.num_steps_sampled_before_learning_starts = 1000
          self.store_buffer_in_checkpoints = False
          self.adam_epsilon = 1e-8
          
          self.tau = 1.0

sunghyun.chung · March 12, 2025, 6:35pm

Hey Christina. Thanks for the reply.

Rather than training, it was more of .reporting() method problem since it was directly relevant to how data is mounted on tensorboard. I got around by setting:

    .reporting(
        # I know that the episode will last at least 30+ no matter what.
        min_sample_timesteps_per_iteration=10
    )

Can I also set min_sample_timesteps_per_iteration using stop_config?
I hard-coded the value min_sample_timesteps_per_iteration=10 because the episode will run at least 30 timesteps so that the every train() would correspond to one episode. Is there a way (or a variable) to enforce 1-1 matching between one iteration and one episode without hardcoding as I did above?

sunghyun.chung · March 12, 2025, 6:37pm

Thanks for the reply, mannyv.

It would have been a lot easier if I saw this comment earlier.
I think overriding AlgorithmConfig or its subclass is a good idea. Thanks for the suggestion.

Meanwhile, what would happen if I set min_sample_timesteps_per_iteration as 0 or 1? Wouldn’t it clip iteration of train() call to match on episode? (since I am batching by an episode in env_runner)

Topic		Replies	Views
Run ONLY on local driver for train() RLlib	0	147	December 21, 2023
Understanding the Stopping Process for ray.rllib.agents.dqn.DQNTrainer.train() RLlib	4	579	May 26, 2021
Trainer is calling reset() even if the trial should have stop RLlib	2	263	May 30, 2022
Algorithm.train does not terminate for custom env Configure Algorithm, Training, Evaluation, Scaling	2	560	May 10, 2023
Ray Train V2 with Ray Tune does not start another trial after a training run is TERMINATED Ray Train	3	15	April 17, 2025

Have one iteration of algo.train() shut down after one episode

Related topics