I’m trying to export a r2d2+lstm (either built in or custom lstm) to onnx. I have successfully exported a model with ppo+lstm (built in) on both ray 2.6.1 and ray 2.41
I’m having an error were state becomes empty list [ ].
I saw these posts about state becoming empty list( [ ] )/empty state.
In my debugging statements of exporting to onnx, i saw that len(state)==2 for (hidden, cell state) for several times. and then it sudden len(state)==0 i.e. state == [ ]
I don’t think a recurrent network should have a empty state be passed in. Is this a bug or is my code wrong somewhere? How to resolve the below error? Or which version of ray is this fixed in? Thanks.
Version
ray 2.6.1
onnx 1.16.1
onnx2pytorch 0.5.1
torch 2.5.1
torchvision 0.20.1
Python 3.9.0
windows 10 pro
config = (
R2D2Config()
.environment("CartPole-v1") # Replace with your environment
.framework("torch") # Use PyTorch framework
.training(
model={
"use_lstm": True,
"max_seq_len": 50,
"lstm_cell_size": 256,
"fcnet_hiddens": [256],
"lstm_use_prev_action": False,
}
)
)
code
import torch
import torch.nn as nn
from ray.rllib.policy.sample_batch import SampleBatch
class ModelWrapper(nn.Module):
def __init__(self, model):
super(ModelWrapper, self).__init__()
self.model = model
def forward(self, obs, state_in_h, state_in_c, prev_actions):
# Reshape states from (256,) as shown above in policy.compute_single_action to (1, 1, 256)
state_in_h = state_in_h.view(1, 1, -1) # (num_layers, batch_size, hidden_size)
state_in_c = state_in_c.view(1, 1, -1)
input_dict = {
SampleBatch.OBS: obs,
"state_in": [state_in_h, state_in_c],
SampleBatch.PREV_ACTIONS: prev_actions.unsqueeze(-1) if prev_actions.dim() == 1 else prev_actions,
"seq_lens": torch.ones(obs.size(0), dtype=torch.int32),
}
output_dict = self.model(input_dict)
# Assuming the model returns a tuple: (logits, state_h, state_c)
logits = output_dict[0]
state_out_h = output_dict[1].squeeze(0).squeeze(0) # Convert to 1-D (256,)
state_out_c = output_dict[2].squeeze(0).squeeze(0)
return logits, state_out_h, state_out_c
# Wrap the original model
wrapped_model = ModelWrapper(model)
wrapped_model.eval()
obs = torch.tensor([[-0.1823, 3.8495, -0.0993, 1.2273]])
state_in_h = torch.zeros(1, 1, 256) # Initial hidden state (num_layers, batch_size, hidden_size)
state_in_c = torch.zeros(1, 1, 256) # Initial cell state (num_layers, batch_size, hidden_size)
prev_actions = torch.zeros(1,1, dtype=torch.int64)
# Combine inputs into a tuple for ONNX export
example_inputs = (obs, state_in_h, state_in_c, prev_actions)
# Export the model
torch.onnx.export(
wrapped_model,
example_inputs,
"cartpole_r2d2_lstm.onnx",
export_params=True,
opset_version=17,
do_constant_folding=True,
input_names=["obs", "state_in_h", "state_in_c", "prev_actions"],
output_names=["logits", "state_out_h", "state_out_c"],
dynamic_axes={
"obs": {0: "batch_size"},
"prev_actions": {0: "batch_size"},
"logits": {0: "batch_size"},
# States are fixed-size 1-D; no dynamic axes needed
},
)
Error:
File "C:\Users...\cartpole_ray2_6_1_r2d2_lstm_training_to_onnx_not_working.py", line 432, in <module>
torch.onnx.export(
File "C:\Users...\__init__.py", line 375, in export
export(
File "C:\Users...\utils.py", line 502, in export
_export(
File "C:\Users...\utils.py", line 1564, in _export
graph, params_dict, torch_out = _model_to_graph(
File "C:\Users...\utils.py", line 1113, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args)
File "C:\Users...\utils.py", line 997, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
File "C:\Users...\utils.py", line 904, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
File "C:\Users...\_trace.py", line 1500, in _get_trace_graph
outs = ONNXTracedModule(
File "C:\Users...\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users...\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users...\_trace.py", line 139, in forward
graph, out = torch._C._create_graph_by_tracing(
File "C:\Users...\_trace.py", line 130, in wrapper
outs.append(self.inner(*trace_inputs))
File "C:\Users...\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users...\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users...\module.py", line 1726, in _slow_forward
result = self.forward(*input, **kwargs)
File "C:\Users...\cartpole_ray2_6_1_r2d2_lstm_training_to_onnx_not_working.py", line 373, in forward
output_dict = self.model(input_dict)
File "C:\Users...\modelv2.py", line 266, in __call__
**res = self.forward(restored, state or [], seq_lens)** # why is [] being passed into forward for recurrent model???
File "C:\Users...\recurrent_net.py", line 265, in forward
return super().forward(input_dict, state, seq_lens)
File "C:\Users...\recurrent_net.py", line 100, in forward
output, new_state = self.forward_rnn(inputs, state, seq_lens)
File "C:\Users...\recurrent_net.py", line 297, in forward_rnn
inputs, [torch.unsqueeze(state[0], 0), torch.unsqueeze(state[1], 0)]
IndexError: list index out of range
Hi christopher,
Can you try updating to Ray to the latest version if possible? Here is a list of releases Releases · ray-project/ray · GitHub.
Ok so just to summarize, when exporting an RLlib R2D2 model (with LSTM) to ONNX, something in the tracing process occasionally causes the “state” that’s passed in to become an empty list (i.e. []). RLlib’s internal code has logic like “state or []”, which behaves fine in normal training (because RLlib usually does provide valid hidden/cell states). But when ONNX tracing kicks in and doesn’t properly capture that state, you end up with an empty list instead of the usual (h, c). Then, of course, if the code expects state[0] or state[1], it throws an “IndexError.”
On first thought, maybe you can try to do a check beforehand and always pass a valid state? I’m not an expert in ONNX but I’m thinking you can get rid of or override the “state or []” fallback so the model never sees an empty list. For example, if your forward sees that the provided state has the wrong shape or is empty, just create a dummy pair of zero tensors instead. To debug it, you can try to print the shapes and contents of your state tensors (h, c) just before doing the ONNX export to confirm they’re valid.
Lemme know if this makes any sense or if that worked or if updating Ray worked.
Thanks Christina for the idea to try. The project that i’m working on is on ray 2.6.1 or ray 2.7.1. I will have to see if it makes sense for us to upgrade the ray version.