[rllib/lib/python3.8/site-packages/ray/rllib/policy/rnn_sequencing -- chop_into_sequences]: computed seq_lens is wrong

Hi, Contributor and Users of Ray,

I find an issue in ray 2.1.0,
in chop_into_sequences, sum(seq_lens) is expected to be equal to len(f_pad), but not.

the alg i used is QMIX and the environment add group info by MyEnv(myconfig).with_agent_groups(grouping, obs_space=obs_space, act_space=act_space)
error happened in compute the padding for feature group_rewards, when the 4th call the function.

Could anybody know what’s wrong here, and how should I fix if I do not want to change the package.

full error stack:

ray.exceptions.RayTaskError(IndexError): ray::QMix.train() (pid=84207, ip=127.0.0.1, repr=QMix)
  File "/opt/anaconda3/envs/rllib/lib/python3.8/site-packages/ray/tune/trainable/trainable.py", line 349, in train
    result = self.step()
  File "/opt/anaconda3/envs/rllib/lib/python3.8/site-packages/ray/rllib/algorithms/algorithm.py", line 673, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "/opt/anaconda3/envs/rllib/lib/python3.8/site-packages/ray/rllib/algorithms/algorithm.py", line 2588, in _run_one_training_iteration
    num_recreated += self.try_recover_from_step_attempt(
  File "/opt/anaconda3/envs/rllib/lib/python3.8/site-packages/ray/rllib/algorithms/algorithm.py", line 2386, in try_recover_from_step_attempt
    raise error
  File "/opt/anaconda3/envs/rllib/lib/python3.8/site-packages/ray/rllib/algorithms/algorithm.py", line 2583, in _run_one_training_iteration
    results = self.training_step()
  File "/opt/anaconda3/envs/rllib/lib/python3.8/site-packages/ray/rllib/algorithms/qmix/qmix.py", line 274, in training_step
    train_results = multi_gpu_train_one_step(self, train_batch)
  File "/opt/anaconda3/envs/rllib/lib/python3.8/site-packages/ray/rllib/execution/train_ops.py", line 176, in multi_gpu_train_one_step
    results = policy.learn_on_loaded_batch(
  File "/opt/anaconda3/envs/rllib/lib/python3.8/site-packages/ray/rllib/policy/torch_policy.py", line 563, in learn_on_loaded_batch
    return self.learn_on_batch(batch)
  File "/opt/anaconda3/envs/rllib/lib/python3.8/site-packages/ray/rllib/algorithms/qmix/qmix_policy.py", line 366, in learn_on_batch
    output_list, _, seq_lens = chop_into_sequences(
  File "/opt/anaconda3/envs/rllib/lib/python3.8/site-packages/ray/rllib/policy/rnn_sequencing.py", line 331, in chop_into_sequences
    f_pad[seq_base + seq_offset] = f[i]
IndexError: index 16024 is out of bounds for axis 0 with size 16024

seq_lens is computed by

    if seq_lens is None or len(seq_lens) == 0:
        prev_id = None
        seq_lens = []
        seq_len = 0
        unique_ids = np.add(
            np.add(episode_ids, agent_indices),
            np.array(unroll_ids, dtype=np.int64) << 32,
        )
        for uid in unique_ids:
            if (prev_id is not None and uid != prev_id) or seq_len >= max_seq_len:
                seq_lens.append(seq_len)
                seq_len = 0
            seq_len += 1
            prev_id = uid
        if seq_len:
            seq_lens.append(seq_len)
        seq_lens = np.array(seq_lens, dtype=np.int32)
print("seq_lens", len(seq_lens), max(seq_lens), sum(seq_lens))

output: seq_lens 1644 10 16180

then f_pad is computed by

 for col in feature_columns:
        if isinstance(col, list):
            col = np.array(col)
        feature_sequences.append([])

        for f in tree.flatten(col):
            # Save unnecessary copy.
            if not isinstance(f, np.ndarray):
                f = np.array(f)

            length = len(seq_lens) * max_seq_len
            if f.dtype == object or f.dtype.type is np.str_:
                f_pad = [None] * length
            else:
                # Make sure type doesn't change.
                f_pad = np.zeros((length,) + np.shape(f)[1:], dtype=f.dtype)

            print('f', len(f_pad), len(f))

            seq_base = 0
            i = 0
            for len_ in seq_lens:
                for seq_offset in range(len_):
                    f_pad[seq_base + seq_offset] = f[i]
                    i += 1
                seq_base += max_seq_len
            assert i == len(f), f
            feature_sequences[-1].append(f_pad)

output f 16440 16024

Its a special environment, and it work well with other alg,
config I specialized, including

        algo_config = CONFIGS["QMIX"]().framework(
            framework='torch'
        ).environment(
            env=config_dict["env"],
            env_config=config_dict["env_args"],
            observation_space=obs_space,
            action_space=act_space,
        ).evaluation(
            evaluation_interval=config_dict["evaluation_interval"],
            custom_evaluation_function=config_dict["custom_evaluation_function"],
            evaluation_parallel_to_training=False,
            enable_async_evaluation=False,
            evaluation_num_workers=0
        ).exploration(
            exploration_config=config_dict['exploration_config']
        ).training(
            train_batch_size=config_dict['batch_episode'] * env_info_dict["episode_limit"],
            model=model_config,
            **(config_dict['algo_args'])
        ).resources(
            num_gpus_per_worker=config_dict["num_gpus_per_worker"],
            num_gpus=config_dict["num_gpus"],
        ).rollouts(
            num_rollout_workers=config_dict["num_workers"],
            rollout_fragment_length=config_dict['rollout_fragment_length']
        )

Thanks for your help!