Structure's sequence length mismatch issue from sgd code for PPO policy

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I am trying to train PPO policy in a custom environment. After training for around 50 iterations the following error is thrown:

ERROR trial_runner.py:1088 -- Trial experiment_HierarchicalGraphColorEnv_bc37e_00000: Error processing event.
ray.exceptions.RayTaskError(ValueError): e[36mray::ImplicitFunc.train()e[39m (pid=8039, ip=172.10.3.120, repr=experiment)
  File "/home/users/anaconda3/envs/conda-env/lib/python3.7/site-packages/ray/tune/trainable/trainable.py", line 367, in train
    raise skipped from exception_cause(skipped)
  File "/home/users/anaconda3/envs/conda-env/lib/python3.7/site-packages/ray/tune/trainable/function_trainable.py", line 338, in entrypoint
    self._status_reporter.get_checkpoint(),
  File "/home/users/anaconda3/envs/conda-env/lib/python3.7/site-packages/ray/tune/trainable/function_trainable.py", line 652, in _trainable_func
    output = fn()
  File "/home/venkatakeerthy.cs.iith/ML-Register-Allocation/model/RegAlloc/ggnn_drl/rllib_split_model/src/experiment_ppo.py", line 59, in experiment
    train_results = train_agent.train()
  File "/home/users/anaconda3/envs/conda-env/lib/python3.7/site-packages/ray/tune/trainable/trainable.py", line 367, in train
    raise skipped from exception_cause(skipped)
  File "/home/users/anaconda3/envs/conda-env/lib/python3.7/site-packages/ray/tune/trainable/trainable.py", line 364, in train
    result = self.step()
  File "/home/users/anaconda3/envs/conda-env/lib/python3.7/site-packages/ray/rllib/algorithms/algorithm.py", line 749, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "/home/users/anaconda3/envs/conda-env/lib/python3.7/site-packages/ray/rllib/algorithms/algorithm.py", line 2623, in _run_one_training_iteration
    results = self.training_step()
  File "/home/venkatakeerthy.cs.iith/ML-Register-Allocation/model/RegAlloc/ggnn_drl/rllib_split_model/src/ppo_new.py", line 379, in training_step
    train_results = train_one_step(self, train_batch)
  File "/home/users/anaconda3/envs/conda-env/lib/python3.7/site-packages/ray/rllib/execution/train_ops.py", line 62, in train_one_step
    [],
  File "/home/users/anaconda3/envs/conda-env/lib/python3.7/site-packages/ray/rllib/utils/sgd.py", line 135, in do_minibatch_sgd
    learner_info = learner_info_builder.finalize()
  File "/home/users/anaconda3/envs/conda-env/lib/python3.7/site-packages/ray/rllib/utils/metrics/learner_info.py", line 87, in finalize
    _all_tower_reduce, *results_all_towers
  File "/home/users/anaconda3/envs/conda-env/lib/python3.7/site-packages/tree/__init__.py", line 550, in map_structure_with_path
    **kwargs)
  File "/home/users/anaconda3/envs/conda-env/lib/python3.7/site-packages/tree/__init__.py", line 841, in map_structure_with_path_up_to
    shallow_structure, input_tree, check_types=check_types)
  File "/home/users/anaconda3/envs/conda-env/lib/python3.7/site-packages/tree/__init__.py", line 684, in _assert_shallow_structure
    shallow_branch, input_branch, check_types=check_types)
  File "/home/users/anaconda3/envs/conda-env/lib/python3.7/site-packages/tree/__init__.py", line 664, in _assert_shallow_structure
    shallow_length=_num_elements(shallow_tree)))
ValueError: The two structures don't have the same sequence length. Input structure has length 10, while shallow structure has length 11.

I recently upgraded the Ray version from 1.4 to 2.2.0. In the later version of ray (2.2.0) the code related to sgd result info is changed, and the error is occurring in the changed code only.

Any help to understand the issue better or towards fixing it is really appreciated.

I am still facing this issue, can someone help with this?

@Siddharth_Jain could you provide a reproducable example? I can take a look into it.