For exporting r2d2+lstm to onnx, why is empty state being passed in?

christopher · February 3, 2025, 5:38pm

I’m trying to export a r2d2+lstm (either built in or custom lstm) to onnx. I have successfully exported a model with ppo+lstm (built in) on both ray 2.6.1 and ray 2.41

I’m having an error were state becomes empty list [ ].

I saw these posts about state becoming empty list( [ ] )/empty state.

github.com/ray-project/ray

[rllib] Recurrent Torch models do not work with A2C/A3C

opened 07:54PM - 22 May 20 UTC

closed 05:57AM - 26 Nov 20 UTC

Arthur-Null

bug stale

### What is the problem? ray-0.9.0.dev0 python 3.7 pytorch 1.5 Traceback… (most recent call last): File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 467, in _process_trial result = self.trial_executor.fetch_result(trial) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 430, in fetch_result result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/worker.py", line 1522, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError(AttributeError): ray::A2C.train() (pid=238206, ip=10.190.174.127) File "python/ray/_raylet.pyx", line 460, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 414, in ray._raylet.execute_task.function_executor File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 500, in train raise e File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 486, in train result = Trainable.train(self) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/tune/trainable.py", line 260, in train result = self._train() File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 132, in _train return self._train_exec_impl() File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 170, in _train_exec_i res = next(self.train_exec_impl) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/util/iter.py", line 682, in __next__ return next(self.built_iterator) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/util/iter.py", line 695, in apply_foreach for item in it: File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/util/iter.py", line 765, in apply_filter for item in it: File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/util/iter.py", line 765, in apply_filter for item in it: File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/util/iter.py", line 695, in apply_foreach for item in it: File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/util/iter.py", line 798, in apply_flatten for item in it: File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/util/iter.py", line 750, in add_wait_hooks item = next(it) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/util/iter.py", line 695, in apply_foreach for item in it: File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/util/iter.py", line 695, in apply_foreach for item in it: File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/execution/rollout_ops.py", line 70, in sampler yield workers.local_worker().sample() File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 521, in sample batches = [self.input_reader.next()] File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 56, in next batches = [self.get_data()] File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 101, in get_data item = next(self.rollout_provider) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 359, in _env_runner callbacks, soft_horizon, no_done_at_end, observation_fn) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 521, in _process_observati outputs.append(episode.batch_builder.build_and_reset(episode)) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/sample_batch_builder.py", line 205, in build self.postprocess_batch_so_far(episode) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/sample_batch_builder.py", line 153, in postp pre_batch, other_batches, episode) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/policy/torch_policy_template.py", line 157, in postproc other_agent_batches, episode) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/a3c/a3c_torch_policy.py", line 46, in add_advant last_r = policy._value(sample_batch[SampleBatch.NEXT_OBS][-1]) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/a3c/a3c_torch_policy.py", line 78, in _value _ = self.model({"obs": torch.Tensor([obs]).to(self.device)}, [], [1]) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/models/modelv2.py", line 177, in __call__ res = self.forward(restored, state or [], seq_lens) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/models/torch/recurrent_net.py", line 69, in forward input_dict["obs_flat"].float(), seq_lens, framework="torch"), File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/policy/rnn_sequencing.py", line 141, in add_time_dimens max_seq_len = padded_batch_size // seq_lens.shape[0] AttributeError: 'list' object has no attribute 'shape' ### Reproduction (REQUIRED) ``` import ray from ray import tune from ray.tune.registry import register_env from ray.rllib.examples.env.repeat_after_me_env import RepeatAfterMeEnv from ray.rllib.examples.env.repeat_initial_obs_env import RepeatInitialObsEnv from ray.rllib.examples.models.rnn_model import RNNModel, TorchRNNModel from ray.rllib.models import ModelCatalog from ray.rllib.utils.test_utils import check_learning_achieved parser = argparse.ArgumentParser() parser.add_argument("--run", type=str, default="A3C") parser.add_argument("--env", type=str, default="RepeatAfterMeEnv") parser.add_argument("--num-cpus", type=int, default=0) parser.add_argument("--as-test", action="store_true") parser.add_argument("--torch", action="store_true") parser.add_argument("--stop-reward", type=float, default=90) parser.add_argument("--stop-iters", type=int, default=100) parser.add_argument("--stop-timesteps", type=int, default=100000) if __name__ == "__main__": args = parser.parse_args() ray.init(num_cpus=args.num_cpus or None) ModelCatalog.register_custom_model( "rnn", TorchRNNModel if args.torch else RNNModel) register_env("RepeatAfterMeEnv", lambda c: RepeatAfterMeEnv(c)) register_env("RepeatInitialObsEnv", lambda _: RepeatInitialObsEnv()) config = { "env": args.env, "env_config": { "repeat_delay": 2, }, "gamma": 0.9, "num_workers": 0, "num_envs_per_worker": 20, "entropy_coeff": 0.001, "vf_loss_coeff": 1e-5, "model": { "custom_model": "rnn", "max_seq_len": 20, }, "use_pytorch": args.torch, } stop = { "training_iteration": args.stop_iters, "timesteps_total": args.stop_timesteps, "episode_reward_mean": args.stop_reward, } results = tune.run(args.run, config=config, stop=stop) if args.as_test: check_learning_achieved(results, args.stop_reward) ray.shutdown() ``` This is the example code provided in ray/rllib/example/custom_rnn_model.py run `python custom_rnn_model.py --run=A3C --torch` If we cannot run your script, we cannot fix your issue. - [x] I have verified my script runs in a clean environment and reproduces the issue. - [x] I have verified the issue also occurs with the [latest wheels](https://docs.ray.io/en/latest/installation.html).

github.com/ray-project/ray

[rllib] Recurrent torch models cause error with A2C

opened 01:10AM - 22 Mar 20 UTC

closed 06:50PM - 08 May 20 UTC

pmacalpine

bug

### What is the problem? When I try and use a recurrent torch model with A2C …I get the following list index out of range error due to a state passed to the model's forward function being empty: `2020-03-21 23:38:55,276 ERROR trial_runner.py:513 -- Trial A2C_CartPole-v1_00000: Error processing event. Traceback (most recent call last): File "/data/home/patmac/ray/python/ray/tune/trial_runner.py", line 459, in _process_trial result = self.trial_executor.fetch_result(trial) File "/data/home/patmac/ray/python/ray/tune/ray_trial_executor.py", line 381, in fetch_result result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT) File "/data/home/patmac/ray/python/ray/worker.py", line 1511, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError(IndexError): ray::A2C.train() (pid=17292, ip=172.16.226.199) File "python/ray/_raylet.pyx", line 445, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 423, in ray._raylet.execute_task.function_executor File "/data/home/patmac/ray/python/ray/rllib/agents/trainer.py", line 502, in train raise e File "/data/home/patmac/ray/python/ray/rllib/agents/trainer.py", line 491, in train result = Trainable.train(self) File "/data/home/patmac/ray/python/ray/tune/trainable.py", line 256, in train result = self._train() File "/data/home/patmac/ray/python/ray/rllib/agents/trainer_template.py", line 146, in _train return self._train_exec_impl() File "/data/home/patmac/ray/python/ray/rllib/agents/trainer_template.py", line 178, in _train_exec_impl res = next(self.train_exec_impl) File "/data/home/patmac/ray/python/ray/util/iter.py", line 635, in __next__ return next(self.built_iterator) File "/data/home/patmac/ray/python/ray/util/iter.py", line 619, in set_restore_context for item in it: File "/data/home/patmac/ray/python/ray/util/iter.py", line 645, in apply_foreach for item in it: File "/data/home/patmac/ray/python/ray/util/iter.py", line 684, in apply_filter for item in it: File "/data/home/patmac/ray/python/ray/util/iter.py", line 645, in apply_foreach for item in it: File "/data/home/patmac/ray/python/ray/util/iter.py", line 716, in apply_flatten for item in it: File "/data/home/patmac/ray/python/ray/util/iter.py", line 669, in add_wait_hooks item = next(it) File "/data/home/patmac/ray/python/ray/util/iter.py", line 645, in apply_foreach for item in it: File "/data/home/patmac/ray/python/ray/util/iter.py", line 645, in apply_foreach for item in it: File "/data/home/patmac/ray/python/ray/util/iter.py", line 645, in apply_foreach for item in it: File "/data/home/patmac/ray/python/ray/util/iter.py", line 395, in base_iterator yield ray.get(futures, timeout=timeout) ray.exceptions.RayTaskError(IndexError): ray::RolloutWorker.par_iter_next() (pid=17293, ip=172.16.226.199) File "python/ray/_raylet.pyx", line 445, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 423, in ray._raylet.execute_task.function_executor File "/data/home/patmac/ray/python/ray/util/iter.py", line 957, in par_iter_next return next(self.local_it) File "/data/home/patmac/ray/python/ray/util/iter.py", line 619, in set_restore_context for item in it: File "/data/home/patmac/ray/python/ray/rllib/evaluation/rollout_worker.py", line 251, in gen_rollouts yield self.sample() File "/data/home/patmac/ray/python/ray/rllib/evaluation/rollout_worker.py", line 492, in sample batches = [self.input_reader.next()] File "/data/home/patmac/ray/python/ray/rllib/evaluation/sampler.py", line 53, in next batches = [self.get_data()] File "/data/home/patmac/ray/python/ray/rllib/evaluation/sampler.py", line 96, in get_data item = next(self.rollout_provider) File "/data/home/patmac/ray/python/ray/rllib/evaluation/sampler.py", line 338, in _env_runner callbacks, soft_horizon, no_done_at_end) File "/data/home/patmac/ray/python/ray/rllib/evaluation/sampler.py", line 487, in _process_observations outputs.append(episode.batch_builder.build_and_reset(episode)) File "/data/home/patmac/ray/python/ray/rllib/evaluation/sample_batch_builder.py", line 199, in build_and_reset self.postprocess_batch_so_far(episode) File "/data/home/patmac/ray/python/ray/rllib/evaluation/sample_batch_builder.py", line 153, in postprocess_batch_so_far pre_batch, other_batches, episode) File "/data/home/patmac/ray/python/ray/rllib/policy/torch_policy_template.py", line 110, in postprocess_trajectory convert_to_non_torch_type(other_agent_batches), episode) File "/data/home/patmac/ray/python/ray/rllib/agents/a3c/a3c_torch_policy.py", line 46, in add_advantages last_r = policy._value(sample_batch[SampleBatch.NEXT_OBS][-1]) File "/data/home/patmac/ray/python/ray/rllib/agents/a3c/a3c_torch_policy.py", line 74, in _value _ = self.model({"obs": torch.Tensor([obs]).to(self.device)}, [], [1]) File "/data/home/patmac/ray/python/ray/rllib/models/modelv2.py", line 150, in __call__ res = self.forward(restored, state or [], seq_lens) File "examples/torch_rnn.py", line 59, in forward h_in = hidden_state[0].reshape(-1, self.rnn_hidden_dim) IndexError: list index out of range` It seems problematic that `[]` is being passed as the state in [a3c_torch_policy.py](https://github.com/ray-project/ray/blob/89d959fd6ac206a1a3b5a6cb151d19290df6a235/rllib/agents/a3c/a3c_torch_policy.py#L74). Maybe this is slightly related to #7206 and recurrent models not being supported yet with torch, but if would be really nice if they are supported. *Ray version and other system information (Python version, TensorFlow version, OS):* Ray: ray: 0.9.0.dev0 (revision 89d959fd6ac206a1a3b5a6cb151d19290df6a235) python: 3.7.2 torch: 1.4.0 OS: Ubuntu 16.04.6 LTS ### Reproduction (REQUIRED) Please provide a script that can be run to reproduce the issue. The script should have **no external library dependencies** (i.e., use fake or mock data / environments): ```python import numpy as np import gym from gym.spaces import Discrete, Box from ray.rllib.models import Model, ModelCatalog from ray.rllib.utils.annotations import override from ray.rllib.utils import try_import_torch import ray from ray import tune from ray.rllib.models.torch.torch_modelv2 import TorchModelV2 from ray.rllib.models.torch.misc import normc_initializer, valid_padding, SlimFC from ray.rllib.models.preprocessors import get_preprocessor _, nn = try_import_torch() import torch import torch.nn.functional as F class RNNModel(TorchModelV2, nn.Module): def __init__(self, obs_space, action_space, num_outputs, model_config, name): TorchModelV2.__init__(self, obs_space, action_space, num_outputs, model_config, name) nn.Module.__init__(self) self.obs_size = _get_size(obs_space) self.rnn_hidden_dim = model_config["lstm_cell_size"] self.fc1 = nn.Linear(self.obs_size, self.rnn_hidden_dim) self.rnn = nn.GRUCell(self.rnn_hidden_dim, self.rnn_hidden_dim) self._logits = SlimFC( self.rnn_hidden_dim, num_outputs, initializer=nn.init.xavier_uniform_) self._value_branch = SlimFC( self.rnn_hidden_dim, 1, initializer=normc_initializer()) self._cur_value = None @override(TorchModelV2) def get_initial_state(self): # make hidden states on same device as model return [self.fc1.weight.new(1, self.rnn_hidden_dim).zero_().squeeze(0)] @override(TorchModelV2) def forward(self, input_dict, hidden_state, seq_lens): x = nn.functional.relu(self.fc1(input_dict["obs_flat"].float())) h_in = hidden_state[0].reshape(-1, self.rnn_hidden_dim) h = self.rnn(x, h_in) logits = self._logits(h) self._cur_value = self._value_branch(h).squeeze(1) return logits, [h] @override(TorchModelV2) def value_function(self): assert self._cur_value is not None, "must call forward() first" return self._cur_value def _get_size(obs_space): return get_preprocessor(obs_space)(obs_space).size if __name__ == "__main__": ray.init() ModelCatalog.register_custom_model("rnn_model", RNNModel) tune.run( "A2C", stop={ "timesteps_total": 200000000, }, config={ "env": "CartPole-v1", "model": { "custom_model": "rnn_model", "custom_options": { "lstm_cell_size": 64, } }, "num_workers": 4, # parallelism "use_pytorch": True, }, ) ``` If we cannot run your script, we cannot fix your issue. - [x] I have verified my script runs in a clean environment and reproduces the issue. - [x] I have verified the issue also occurs with the [latest wheels](https://ray.readthedocs.io/en/latest/installation.html).

@sven1977

In my debugging statements of exporting to onnx, i saw that len(state)==2 for (hidden, cell state) for several times. and then it sudden len(state)==0 i.e. state == [ ]

I don’t think a recurrent network should have a empty state be passed in. Is this a bug or is my code wrong somewhere? How to resolve the below error? Or which version of ray is this fixed in? Thanks.

Version

ray 2.6.1
onnx 1.16.1
onnx2pytorch 0.5.1
torch 2.5.1
torchvision 0.20.1
Python 3.9.0
windows 10 pro

config = (
    R2D2Config()
    .environment("CartPole-v1")  # Replace with your environment
    .framework("torch")  # Use PyTorch framework
    .training(
        model={
            "use_lstm": True,
            "max_seq_len": 50,
            "lstm_cell_size": 256,
             "fcnet_hiddens": [256],    
            "lstm_use_prev_action": False,

        }
    )
)

code

import torch
import torch.nn as nn
from ray.rllib.policy.sample_batch import SampleBatch

class ModelWrapper(nn.Module):
    def __init__(self, model):
        super(ModelWrapper, self).__init__()
        self.model = model

    def forward(self, obs, state_in_h, state_in_c, prev_actions):
        # Reshape states from (256,) as shown above in policy.compute_single_action to (1, 1, 256)
        state_in_h = state_in_h.view(1, 1, -1)  # (num_layers, batch_size, hidden_size)
        state_in_c = state_in_c.view(1, 1, -1)
        
        input_dict = {
            SampleBatch.OBS: obs,
            "state_in": [state_in_h, state_in_c],
            SampleBatch.PREV_ACTIONS: prev_actions.unsqueeze(-1) if prev_actions.dim() == 1 else prev_actions,
            "seq_lens": torch.ones(obs.size(0), dtype=torch.int32),
        }
        
        output_dict = self.model(input_dict)
        
        # Assuming the model returns a tuple: (logits, state_h, state_c)
        logits = output_dict[0]
        state_out_h = output_dict[1].squeeze(0).squeeze(0)  # Convert to 1-D (256,)
        state_out_c = output_dict[2].squeeze(0).squeeze(0)
        
        return logits, state_out_h, state_out_c

# Wrap the original model
wrapped_model = ModelWrapper(model)
wrapped_model.eval()


obs = torch.tensor([[-0.1823,  3.8495, -0.0993,  1.2273]])
state_in_h = torch.zeros(1, 1, 256)  # Initial hidden state (num_layers, batch_size, hidden_size)
state_in_c = torch.zeros(1, 1, 256)  # Initial cell state (num_layers, batch_size, hidden_size)
prev_actions = torch.zeros(1,1, dtype=torch.int64)  

# Combine inputs into a tuple for ONNX export
example_inputs = (obs, state_in_h, state_in_c, prev_actions)

# Export the model
torch.onnx.export(
    wrapped_model,
    example_inputs,
    "cartpole_r2d2_lstm.onnx",
    export_params=True,
    opset_version=17,
    do_constant_folding=True,
    input_names=["obs", "state_in_h", "state_in_c", "prev_actions"],
    output_names=["logits", "state_out_h", "state_out_c"],
    dynamic_axes={
        "obs": {0: "batch_size"},
        "prev_actions": {0: "batch_size"},
        "logits": {0: "batch_size"},
        # States are fixed-size 1-D; no dynamic axes needed
    },
)

Error:




 File "C:\Users...\cartpole_ray2_6_1_r2d2_lstm_training_to_onnx_not_working.py", line 432, in <module>       
    torch.onnx.export(
  File "C:\Users...\__init__.py", line 375, in export
    export(
  File "C:\Users...\utils.py", line 502, in export
    _export(
  File "C:\Users...\utils.py", line 1564, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "C:\Users...\utils.py", line 1113, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "C:\Users...\utils.py", line 997, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "C:\Users...\utils.py", line 904, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
  File "C:\Users...\_trace.py", line 1500, in _get_trace_graph
    outs = ONNXTracedModule(
  File "C:\Users...\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users...\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users...\_trace.py", line 139, in forward
    graph, out = torch._C._create_graph_by_tracing(
  File "C:\Users...\_trace.py", line 130, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "C:\Users...\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users...\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users...\module.py", line 1726, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "C:\Users...\cartpole_ray2_6_1_r2d2_lstm_training_to_onnx_not_working.py", line 373, in forward        
    output_dict = self.model(input_dict)
  File "C:\Users...\modelv2.py", line 266, in __call__
    **res = self.forward(restored, state or [], seq_lens)** # why is [] being passed into forward for recurrent model???
  File "C:\Users...\recurrent_net.py", line 265, in forward
    return super().forward(input_dict, state, seq_lens)
  File "C:\Users...\recurrent_net.py", line 100, in forward
    output, new_state = self.forward_rnn(inputs, state, seq_lens)
  File "C:\Users...\recurrent_net.py", line 297, in forward_rnn
    inputs, [torch.unsqueeze(state[0], 0), torch.unsqueeze(state[1], 0)]
IndexError: list index out of range

christina · February 4, 2025, 7:40pm

Hi christopher,
Can you try updating to Ray to the latest version if possible? Here is a list of releases Releases · ray-project/ray · GitHub.

Ok so just to summarize, when exporting an RLlib R2D2 model (with LSTM) to ONNX, something in the tracing process occasionally causes the “state” that’s passed in to become an empty list (i.e. []). RLlib’s internal code has logic like “state or []”, which behaves fine in normal training (because RLlib usually does provide valid hidden/cell states). But when ONNX tracing kicks in and doesn’t properly capture that state, you end up with an empty list instead of the usual (h, c). Then, of course, if the code expects state[0] or state[1], it throws an “IndexError.”

On first thought, maybe you can try to do a check beforehand and always pass a valid state? I’m not an expert in ONNX but I’m thinking you can get rid of or override the “state or []” fallback so the model never sees an empty list. For example, if your forward sees that the provided state has the wrong shape or is empty, just create a dummy pair of zero tensors instead. To debug it, you can try to print the shapes and contents of your state tensors (h, c) just before doing the ONNX export to confirm they’re valid.

Lemme know if this makes any sense or if that worked or if updating Ray worked.

christopher · February 4, 2025, 9:59pm

Thanks Christina for the idea to try. The project that i’m working on is on ray 2.6.1 or ray 2.7.1. I will have to see if it makes sense for us to upgrade the ray version.

mannyv · February 4, 2025, 10:00pm

@christopher @christina,

R2D2 and many of the other specialized RL algorithms were depreciated in ray 2.37. If you do update, make sure not to update past 2.36.

Manny

christopher · February 4, 2025, 10:06pm

Thanks Manny. Will keep that in mind

christopher · February 4, 2025, 10:12pm

@mannyv , I saw you posted here:

Do you know how I can get access to this custom rnn code example? or do you happen to have a saved copy by any chance? looks like it’s no longer available online.
https://github.com/ray-project/ray/blob/master/rllib/examples/models/rnn_model.py

mannyv · February 4, 2025, 10:18pm

Use tags to go back in time =).

github.com/ray-project/ray

rllib/examples/models/rnn_model.py

ray-2.10.0

import numpy as np

from ray.rllib.models.modelv2 import ModelV2
from ray.rllib.models.preprocessors import get_preprocessor
from ray.rllib.models.tf.recurrent_net import RecurrentNetwork
from ray.rllib.models.torch.recurrent_net import RecurrentNetwork as TorchRNN
from ray.rllib.utils.annotations import override
from ray.rllib.utils.framework import try_import_tf, try_import_torch

tf1, tf, tfv = try_import_tf()
torch, nn = try_import_torch()


class RNNModel(RecurrentNetwork):
    """Example of using the Keras functional API to define a RNN model."""

    def __init__(
        self,
        obs_space,
        action_space,

This file has been truncated. show original

christopher · February 4, 2025, 10:24pm

Thanks so much manny! very clever to use tags!

christina · February 4, 2025, 11:37pm

Good luck and let me know how it goes!!

christopher · February 5, 2025, 6:39pm

@christina it solved the empty state part! thank you!

christopher · February 5, 2025, 6:40pm

this was very helpful too. thanks @mannyv

Topic		Replies	Views
State shapes incorrect using custom model (TorchModelV2) (PPO) RLlib	2	437	July 15, 2021
Custom RNN Model with Examples - why do they fail? RLlib	11	2373	May 5, 2022
Issue with custom LSTMs RLlib	34	2181	February 26, 2023
RNN L2 weights regularization RLlib	41	2079	July 5, 2021
Problem with handling states in RNN RLlib	2	749	February 27, 2023

For exporting r2d2+lstm to onnx, why is empty state being passed in?

Related topics