DQN (and maybe other algo) should take into account the "num_envs_per_worker" config when computing the round robin native_ratio used to determined the number of steps to use for training

A possible improvement:

DQN (and maybe other algo) should take into account the “num_envs_per_worker” config when computing the round robin native_ratio used to determined the number of steps to use for training

When computing the native ratio below (from DQNTrainer) (to know for how many steps to train compared to the steps added in the replay buffer), we should use:

native_ratio = (
    config["train_batch_size"] / (config["rollout_fragment_length"]*config["num_envs_per_worker"]))

instead of

native_ratio = (
    config["train_batch_size"] / config["rollout_fragment_length"])

This is because the effective “rollout_fragment_length” is (“rollout_fragment_length” * “num_envs_per_worker”) as described in the doc:

# Divide episodes into fragments of this many steps each during rollouts.
# Sample batches of this size are collected from rollout workers and
# combined into a larger batch of `train_batch_size` for learning.
#
# For example, given rollout_fragment_length=100 and train_batch_size=1000:
#   1. RLlib collects 10 fragments of 100 steps each from rollout workers.
#   2. These fragments are concatenated and we perform an epoch of SGD.
#
# When using multiple envs per worker, the fragment size is multiplied by
# `num_envs_per_worker`. This is since we are collecting steps from
# multiple envs in parallel. For example, if num_envs_per_worker=5, then
# rollout workers will return experiences in chunks of 5*100 = 500 steps.
#
# The dataflow here can vary per algorithm. For example, PPO further
# divides the train batch into minibatches for multi-epoch SGD.
"rollout_fragment_length": 200,

This change is related to rllib.agents.dqn.dqn.py:

def calculate_rr_weights(config: TrainerConfigDict) -> List[float]:
    """Calculate the round robin weights for the rollout and train steps"""
    if not config["training_intensity"]:
        return [1, 1]
    # e.g., 32 / 4 -> native ratio of 8.0
    native_ratio = (
        config["train_batch_size"] / config["rollout_fragment_length"])
    # Training intensity is specified in terms of
    # (steps_replayed / steps_sampled), so adjust for the native ratio.
    weights = [1, config["training_intensity"] / native_ratio]
    return weights
1 Like

In fact we may even need to use

native_ratio = (
    config["train_batch_size"] / (config["rollout_fragment_length"]*config["num_envs_per_worker"]*config["num_workers"]))
1 Like

Thanks @Maxime_Riche ! This looks totally reasonable. I’ll create a PR.