Hi, Contributor and Users of Ray,
I find an issue in ray 2.1.0,
in chop_into_sequences
, sum(seq_lens)
is expected to be equal to len(f_pad)
, but not.
the alg i used is QMIX
and the environment add group info by MyEnv(myconfig).with_agent_groups(grouping, obs_space=obs_space, act_space=act_space)
error happened in compute the padding for feature group_rewards
, when the 4th call the function.
Could anybody know what’s wrong here, and how should I fix if I do not want to change the package.
full error stack:
ray.exceptions.RayTaskError(IndexError): ray::QMix.train() (pid=84207, ip=127.0.0.1, repr=QMix)
File "/opt/anaconda3/envs/rllib/lib/python3.8/site-packages/ray/tune/trainable/trainable.py", line 349, in train
result = self.step()
File "/opt/anaconda3/envs/rllib/lib/python3.8/site-packages/ray/rllib/algorithms/algorithm.py", line 673, in step
results, train_iter_ctx = self._run_one_training_iteration()
File "/opt/anaconda3/envs/rllib/lib/python3.8/site-packages/ray/rllib/algorithms/algorithm.py", line 2588, in _run_one_training_iteration
num_recreated += self.try_recover_from_step_attempt(
File "/opt/anaconda3/envs/rllib/lib/python3.8/site-packages/ray/rllib/algorithms/algorithm.py", line 2386, in try_recover_from_step_attempt
raise error
File "/opt/anaconda3/envs/rllib/lib/python3.8/site-packages/ray/rllib/algorithms/algorithm.py", line 2583, in _run_one_training_iteration
results = self.training_step()
File "/opt/anaconda3/envs/rllib/lib/python3.8/site-packages/ray/rllib/algorithms/qmix/qmix.py", line 274, in training_step
train_results = multi_gpu_train_one_step(self, train_batch)
File "/opt/anaconda3/envs/rllib/lib/python3.8/site-packages/ray/rllib/execution/train_ops.py", line 176, in multi_gpu_train_one_step
results = policy.learn_on_loaded_batch(
File "/opt/anaconda3/envs/rllib/lib/python3.8/site-packages/ray/rllib/policy/torch_policy.py", line 563, in learn_on_loaded_batch
return self.learn_on_batch(batch)
File "/opt/anaconda3/envs/rllib/lib/python3.8/site-packages/ray/rllib/algorithms/qmix/qmix_policy.py", line 366, in learn_on_batch
output_list, _, seq_lens = chop_into_sequences(
File "/opt/anaconda3/envs/rllib/lib/python3.8/site-packages/ray/rllib/policy/rnn_sequencing.py", line 331, in chop_into_sequences
f_pad[seq_base + seq_offset] = f[i]
IndexError: index 16024 is out of bounds for axis 0 with size 16024
seq_lens
is computed by
if seq_lens is None or len(seq_lens) == 0:
prev_id = None
seq_lens = []
seq_len = 0
unique_ids = np.add(
np.add(episode_ids, agent_indices),
np.array(unroll_ids, dtype=np.int64) << 32,
)
for uid in unique_ids:
if (prev_id is not None and uid != prev_id) or seq_len >= max_seq_len:
seq_lens.append(seq_len)
seq_len = 0
seq_len += 1
prev_id = uid
if seq_len:
seq_lens.append(seq_len)
seq_lens = np.array(seq_lens, dtype=np.int32)
print("seq_lens", len(seq_lens), max(seq_lens), sum(seq_lens))
output: seq_lens 1644 10 16180
then f_pad
is computed by
for col in feature_columns:
if isinstance(col, list):
col = np.array(col)
feature_sequences.append([])
for f in tree.flatten(col):
# Save unnecessary copy.
if not isinstance(f, np.ndarray):
f = np.array(f)
length = len(seq_lens) * max_seq_len
if f.dtype == object or f.dtype.type is np.str_:
f_pad = [None] * length
else:
# Make sure type doesn't change.
f_pad = np.zeros((length,) + np.shape(f)[1:], dtype=f.dtype)
print('f', len(f_pad), len(f))
seq_base = 0
i = 0
for len_ in seq_lens:
for seq_offset in range(len_):
f_pad[seq_base + seq_offset] = f[i]
i += 1
seq_base += max_seq_len
assert i == len(f), f
feature_sequences[-1].append(f_pad)
output f 16440 16024
Its a special environment, and it work well with other alg,
config I specialized, including
algo_config = CONFIGS["QMIX"]().framework(
framework='torch'
).environment(
env=config_dict["env"],
env_config=config_dict["env_args"],
observation_space=obs_space,
action_space=act_space,
).evaluation(
evaluation_interval=config_dict["evaluation_interval"],
custom_evaluation_function=config_dict["custom_evaluation_function"],
evaluation_parallel_to_training=False,
enable_async_evaluation=False,
evaluation_num_workers=0
).exploration(
exploration_config=config_dict['exploration_config']
).training(
train_batch_size=config_dict['batch_episode'] * env_info_dict["episode_limit"],
model=model_config,
**(config_dict['algo_args'])
).resources(
num_gpus_per_worker=config_dict["num_gpus_per_worker"],
num_gpus=config_dict["num_gpus"],
).rollouts(
num_rollout_workers=config_dict["num_workers"],
rollout_fragment_length=config_dict['rollout_fragment_length']
)
Thanks for your help!