How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
So I have a custom environment and I am able to do training on it using a standard feedforward network using rllib version 2.6.1
with PPO
algorithm. I wanted to try out improvement in training agent using LSTM and set use_LSTM
to True
and vf_share_layers
to False
. However I am getting the following error message
Exception has occurred: IndexError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
The shape of the mask [128, 11] at index 1 does not match the shape of the indexed tensor [128, 20] at index 1
File "/Users/paula/Desktop/Projects/venvs/L2RPN_080_RLLIB_261_Grid2OP_195/lib/python3.11/site-packages/ray/rllib/algorithms/ppo/torch/ppo_torch_learner.py", line 59, in possibly_masked_mean
return torch.sum(t[mask]) / num_valid
~^^^^^^
File "/Users/paula/Desktop/Projects/venvs/L2RPN_080_RLLIB_261_Grid2OP_195/lib/python3.11/site-packages/ray/rllib/algorithms/ppo/torch/ppo_torch_learner.py", line 88, in compute_loss_for_module
mean_kl_loss = possibly_masked_mean(action_kl)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/paula/Desktop/Projects/venvs/L2RPN_080_RLLIB_261_Grid2OP_195/lib/python3.11/site-packages/ray/rllib/core/learner/learner.py", line 995, in compute_loss
loss = self.compute_loss_for_module(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/paula/Desktop/Projects/venvs/L2RPN_080_RLLIB_261_Grid2OP_195/lib/python3.11/site-packages/ray/rllib/core/learner/torch/torch_learner.py", line 123, in _uncompiled_update
loss_per_module = self.compute_loss(fwd_out=fwd_out, batch=batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/paula/Desktop/Projects/venvs/L2RPN_080_RLLIB_261_Grid2OP_195/lib/python3.11/site-packages/ray/rllib/core/learner/torch/torch_learner.py", line 365, in _update
return self._possibly_compiled_update(batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/paula/Desktop/Projects/venvs/L2RPN_080_RLLIB_261_Grid2OP_195/lib/python3.11/site-packages/ray/rllib/core/learner/learner.py", line 1220, in update
) = self._update(nested_tensor_minibatch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/paula/Desktop/Projects/venvs/L2RPN_080_RLLIB_261_Grid2OP_195/lib/python3.11/site-packages/ray/rllib/core/learner/learner_group.py", line 184, in update
self._learner.update(
File "/Users/paula/Desktop/Projects/venvs/L2RPN_080_RLLIB_261_Grid2OP_195/lib/python3.11/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 448, in training_step
train_results = self.learner_group.update(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/paula/Desktop/Projects/venvs/L2RPN_080_RLLIB_261_Grid2OP_195/lib/python3.11/site-packages/ray/rllib/algorithms/algorithm.py", line 2837, in _run_one_training_iteration
results = self.training_step()
^^^^^^^^^^^^^^^^^^^^
File "/Users/paula/Desktop/Projects/venvs/L2RPN_080_RLLIB_261_Grid2OP_195/lib/python3.11/site-packages/ray/rllib/algorithms/algorithm.py", line 853, in step
results, train_iter_ctx = self._run_one_training_iteration()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/paula/Desktop/Projects/venvs/L2RPN_080_RLLIB_261_Grid2OP_195/lib/python3.11/site-packages/ray/tune/trainable/trainable.py", line 375, in train
raise skipped from exception_cause(skipped)
File "/Users/paula/Desktop/Projects/venvs/L2RPN_080_RLLIB_261_Grid2OP_195/lib/python3.11/site-packages/ray/tune/trainable/trainable.py", line 375, in train
raise skipped from exception_cause(skipped)
File "/Users/paula/Desktop/Projects/RL Practice/RLLIB_Practice4/train_LSTM.py", line 239, in train
result = agent.nn_model.train()
I am not sharing the full code as it is very complicated, but I did a bit of debugging and saw that the issue is occurring on this line https://github.com/ray-project/ray/blob/a2d38078d3a2f502c0e22c1132745e206181810c/rllib/algorithms/ppo/torch/ppo_torch_learner.py#L59 Here the variable t
passed into is actually action_kl
variable. The reason for the issue is obvious mask
is a boolean matrix with different number of True
and False
on each row. t.shape
is [128, 20] and mask.shape
is [128, 11] which is causing this issue.
Now mask
is getting computed here https://github.com/ray-project/ray/blob/fd9a02e9cef9cff0e58e99274622c651e1227f4c/rllib/algorithms/ppo/torch/ppo_torch_learner.py#L55 and maxlen
is coming out to be 11 because batch[SampleBatch.SEQ_LENS]
has the following values
tensor([ 4., 4., 6., 3., 3., 2., 1., 5., 3., 3., 4., 3., 1., 8., 2., 5., 3., 7., 1., 4., 6., 4., 9., 7., 7., 1., 6., 6., 1., 7., 2., 3., 4., 2., 7., 1., 1., 4., 1., 5., 10., 7., 5., 6., 2., 3., 8., 1., 1., 9., 3., 5., 1., 2., 3., 1., 3., 2., 5., 3., 2., 4., 1., 4., 5., 2., 2., 4., 2., 2., 2., 3., 3., 4., 3., 1., 2., 4., 1., 5., 5., 2., 2., 3., 4., 11., 1., 4., 3., 5., 1., 5., 3., 5., 3., 3., 3., 3., 2., 1., 3., 5., 4., 3., 1., 3., 4., 3., 5., 3., 4., 4., 4., 4., 3., 2., 3., 10., 6., 1., 11., 2., 2., 6., 4., 1., 6., 3.])
Since the highest value is 11 so mask has. a shape of [128, 11]
.
My question is do I need to do set some values or is this is a issue with the code, any feedback and suggestions would be appreciated!