DQN (and maybe other algo) should take into account the "num_envs_per_worker" config when computing the round robin native_ratio used to determined the number of steps to use for training

Maxime_Riche · April 17, 2021, 11:22am

A possible improvement:

DQN (and maybe other algo) should take into account the “num_envs_per_worker” config when computing the round robin native_ratio used to determined the number of steps to use for training

When computing the native ratio below (from DQNTrainer) (to know for how many steps to train compared to the steps added in the replay buffer), we should use:

native_ratio = (
    config["train_batch_size"] / (config["rollout_fragment_length"]*config["num_envs_per_worker"]))

instead of

native_ratio = (
    config["train_batch_size"] / config["rollout_fragment_length"])

This is because the effective “rollout_fragment_length” is (“rollout_fragment_length” * “num_envs_per_worker”) as described in the doc:

# Divide episodes into fragments of this many steps each during rollouts.
# Sample batches of this size are collected from rollout workers and
# combined into a larger batch of `train_batch_size` for learning.
#
# For example, given rollout_fragment_length=100 and train_batch_size=1000:
#   1. RLlib collects 10 fragments of 100 steps each from rollout workers.
#   2. These fragments are concatenated and we perform an epoch of SGD.
#
# When using multiple envs per worker, the fragment size is multiplied by
# `num_envs_per_worker`. This is since we are collecting steps from
# multiple envs in parallel. For example, if num_envs_per_worker=5, then
# rollout workers will return experiences in chunks of 5*100 = 500 steps.
#
# The dataflow here can vary per algorithm. For example, PPO further
# divides the train batch into minibatches for multi-epoch SGD.
"rollout_fragment_length": 200,

This change is related to rllib.agents.dqn.dqn.py:

def calculate_rr_weights(config: TrainerConfigDict) -> List[float]:
    """Calculate the round robin weights for the rollout and train steps"""
    if not config["training_intensity"]:
        return [1, 1]
    # e.g., 32 / 4 -> native ratio of 8.0
    native_ratio = (
        config["train_batch_size"] / config["rollout_fragment_length"])
    # Training intensity is specified in terms of
    # (steps_replayed / steps_sampled), so adjust for the native ratio.
    weights = [1, config["training_intensity"] / native_ratio]
    return weights

Maxime_Riche · April 19, 2021, 8:09am

In fact we may even need to use

native_ratio = (
    config["train_batch_size"] / (config["rollout_fragment_length"]*config["num_envs_per_worker"]*config["num_workers"]))

sven1977 · April 21, 2021, 11:12am

Thanks @Maxime_Riche ! This looks totally reasonable. I’ll create a PR.

sven1977 · April 21, 2021, 11:20am

github.com/ray-project/ray

[RLlib] Discussion 1763: DQN native_ratio (for training intensity) incorrect.

ray-project:master ← sven1977:discussion_1763_dqn_native_ratio_wrong

opened 11:18AM - 21 Apr 21 UTC

sven1977

+10 -3

DQN native_ratio (for training intensity) incorrect. Also see this discussion… here: https://discuss.ray.io/t/dqn-and-maybe-other-algo-should-take-into-account-the-num-envs-per-worker-config-when-computing-the-round-robin-native-ratio-used-to-determined-the-number-of-steps-to-use-for-training/1763/3 ## Why are these changes needed? ## Related issue number ## Checks - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :(

Topic		Replies	Views
Understanding the Stopping Process for ray.rllib.agents.dqn.DQNTrainer.train() RLlib	4	601	May 26, 2021
Is it possible to change the number of workers during training?	0	421	January 6, 2021
DQN Rollout Config to fit Nature DQN Configure Algorithm, Training, Evaluation, Scaling	1	409	June 2, 2023
Recommended way to evaluate training results RLlib	0	3295	June 12, 2021
Scaling Battle Experiments RLlib	1	377	January 26, 2022

DQN (and maybe other algo) should take into account the "num_envs_per_worker" config when computing the round robin native_ratio used to determined the number of steps to use for training

Related topics