Understanding the Stopping Process for ray.rllib.agents.dqn.DQNTrainer.train()

Saurabh_Arora · May 19, 2021, 3:00pm

Hi everyone.

I am using default config setting for ray.rllib.agents.dqn.DQNTrainer(config=config, env=select_env)
When I run one execution of DQNTrainer.train() method, it gives an arbitrary number of episodes and arbitrary number of training iterations.

Can someone please help me understand step-by-step how does DQNTrainer.train() method decide to stop?

kai · May 19, 2021, 3:05pm

cc @sven1977 can you help with this?

Saurabh_Arora · May 19, 2021, 8:47pm

@sven1977 , I will really appreciate any help on this.

sven1977 · May 24, 2021, 1:47pm

Hey @Saurabh_Arora , thanks for this question!

You always get one training iteration upon calling trainer.train(). You can check this via trainer.iteration afterwards.

The number of episodes depends on the environment (does it stop early?, etc…).

And env steps depend on the algorithm’s execution plan. Some algos perform a fixed set of update steps per iteration, e.g. PPO. Others have a timing mechanism, like DQN, IMPALA.

In particular, DQN has this here in its plan (ray/rllib/agents/dqn/dqn.py):

    train_op = Concurrently(
        [store_op, replay_op],
        mode="round_robin",
        output_indexes=[1],
        round_robin_weights=calculate_rr_weights(config))  # <- these weights here determine the ratio between a) sample-collection-and-storing and b) sampling-from-buffer-and-train-one-step updates

The round robin weighting causes each iteration to contain a slightly different number of steps.

Saurabh_Arora · May 26, 2021, 1:47pm

@sven1977 , thanks for responding. Can I customize stopping criterion of trainer.train() while running PPO (or DQN)?

Topic		Replies	Views
Multi-Agent Training with Different Algorithms RLlib	24	3439	October 11, 2022
DQN in RLlib not leading to the same results as Vanilla PyTorch Implementation Configure Algorithm, Training, Evaluation, Scaling	0	338	June 21, 2023
Agent.train() vs ray.tune.run RLlib	1	763	September 4, 2022
Evaluation worker won't stop RLlib	3	569	June 19, 2022
Stopping criteria for PPOTrainer RLlib	2	836	January 30, 2022

Understanding the Stopping Process for ray.rllib.agents.dqn.DQNTrainer.train()

Related topics