'use_lstm' with centralized critic for PPO

hkml · April 3, 2022, 3:09pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hello, I am implementing a centralized value function for PPO following the example for the TwoStepGame and I have managed to make the training work for it without LSTM, but if I set ‘use_lstm’, I am receiving a size mismatch error:

(pid=14475)   File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
(pid=14475)     return forward_call(*input, **kwargs)
(pid=14475)   File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 677, in forward
(pid=14475)     self.check_forward_args(input, hx, batch_sizes)
(pid=14475)   File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 620, in check_forward_args
(pid=14475)     self.check_input(input, batch_sizes)
(pid=14475)   File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 207, in check_input
(pid=14475)     self.input_size, input.size(-1)))
(pid=14475) RuntimeError: input.size(-1) must be equal to input_size. Expected 63, got 276

If we override the PPO model as noted in the example i.e. if we have

CCPPOTorchPolicy = PPOTorchPolicy.with_updates(
    name="CCPPOTorchPolicy",
    postprocess_fn=centralized_critic_postprocessing,
    loss_fn=loss_with_central_critic,
    before_init=setup_torch_mixins,
    mixins=[
        TorchLR, TorchEntropyCoeffSchedule, TorchKLCoeffMixin,
        CentralizedValueMixin
    ])

Should we still expect the ‘use_lstm’ to work out of the box, or would we need to accommodate the updated model. I have looked into the wrapper’s code, but I am lost since my model itself seems to work given that the training proceeds as expected without ‘use_lstm’

Topic		Replies	Views
RLLib PPO with centralized critic and LSTM (torch) RLlib	0	397	November 30, 2021
Custom model with LSTM crashes PPO sampler.py RLlib	0	264	November 24, 2023
How to integrate LSTM into CNN+PPO RLlib	3	52	April 25, 2025
Issue with LSTM PPO mask dimension mismatch with custom environment RLlib	1	208	November 15, 2023
Encountering dimensional issues when porting LSTM to CNN+PPO RLlib	0	17	April 14, 2025

'use_lstm' with centralized critic for PPO

Related topics