1. Severity of the issue: (select one)
High: Completely blocks me.
2. Environment:
- Ray version: 2.42.1
- Python version: 3.12.9
- OS: Windows 11
- Cloud/Infrastructure: Local Machine Only
- Other libs/tools (if relevant): None
3. What happened vs. what you expected:
- Using: trainer = config.build_algo(), result = trainer.train()
- Expected: The algorithm gathers data by running all episodes until all data has been gathered and then begin training/backprop, and then returns a finished result.
- Actual: The algorithm gathers data and episodes are all finished and data fully collected, before training begins the program crashes do to the following error:
File "Z:\Thesis\Reinforcement Learning\venv\Lib\site-packages\numpy\_core\shape_base.py", line 449, in stack
raise ValueError('all input arrays must have the same shape')
Steps for Problem Diagnoses
- In the shape_base.py file, i made the following code change:
shapes = {arr.shape for arr in arrays}
print("shapes: ", shapes) #ADDED HERE LINE 447
if len(shapes) != 1:
raise ValueError('all input arrays must have the same shape')
- the print prior to this exception is the following:
shapes: {(58, 1, 7), (35, 1, 7), (41, 1, 7), (38, 1, 7), (57, 1, 7)}
- Through testing and observation I figured out the the number of arrays here corresponds to the number of episodes that were ran during the exploration and the first column per array corresponds to the number of steps that were executed per episode (the other 2 axis are part of the observation space for my data which is basically just a 2D holder for my features)
- So the problem here seems to be fairly obvious, which is that the episodes are not being padded up or truncated properly internally within ray before it tries to combine them after exploration has completed (note that setting config.env_runners(batch_mode = ‘truncate_episodes’) does not change this error at all)
- I understand that this could be a problem with how my data is structured, but also based on the shape prints I have reason to believe I could be missing some sort of configuration parameter on my customrlmodule(i have tried changing things like train_batch_size, minibatch_size, rollout_fragement_length, max_seq_len, use_lstm, and nothing changes), or maybe I have not implemented something in my customrlmodule or that it could be some kind of internal failure
- I was following this implementation as a guide:
https://github.com/ray-project/ray/blob/master/rllib/examples/rl_modules/classes/lstm_containing_rlm.py#L99
Additional Notes on the CustomRLModule
- Originally I didn’t do an LSTM implementation, and training worked fine across multiple epochs and other configuration params, this error only started occuring when i started returning the internal state and setting config params like “max_seq_len”
- My module is designed to take in heterogenous graphs as input (only the features for nodes as edge_index will remain the same), so the inputs recieved into my model are of the shape [B, T, N, F] B= batch, T = timestep, N=node, F=features (I am not shipping T out of the env, T started being added by ray when i started using the LSTM)
my customrlmodule inherits from: TorchRLModule, ValueFunctionAPI
implemented methods are: setup(), get_initial_state(), _forward(), compute_values()
if more specific details on the code are needed please ask and I can get them for you, there just is a lot to explain since im using custom env's and custom wrappers and i dont know what all is relevant
Full Error Stack Trace:
Traceback (most recent call last):
File "z:\Thesis\Reinforcement Learning\Trainer.py", line 182, in <module>
result = trainer.train()
^^^^^^^^^^^^^^^
File "Z:\Thesis\Reinforcement Learning\venv\Lib\site-packages\ray\tune\trainable\trainable.py", line 331, in train
raise skipped from exception_cause(skipped)
File "Z:\Thesis\Reinforcement Learning\venv\Lib\site-packages\ray\tune\trainable\trainable.py", line 328, in train
result = self.step()
^^^^^^^^^^^
File "Z:\Thesis\Reinforcement Learning\venv\Lib\site-packages\ray\rllib\algorithms\algorithm.py", line 1022, in step
train_results, train_iter_ctx = self._run_one_training_iteration()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Z:\Thesis\Reinforcement Learning\venv\Lib\site-packages\ray\rllib\algorithms\algorithm.py", line 3382, in _run_one_training_iteration
training_step_return_value = self.training_step()
^^^^^^^^^^^^^^^^^^^^
File "Z:\Thesis\Reinforcement Learning\venv\Lib\site-packages\ray\rllib\algorithms\ppo\ppo.py", line 429, in training_step
learner_results = self.learner_group.update_from_episodes(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Z:\Thesis\Reinforcement Learning\venv\Lib\site-packages\ray\rllib\core\learner\learner_group.py", line 327, in update_from_episodes
return self._update(
^^^^^^^^^^^^^
File "Z:\Thesis\Reinforcement Learning\venv\Lib\site-packages\ray\rllib\core\learner\learner_group.py", line 422, in _update
_learner_update(
File "Z:\Thesis\Reinforcement Learning\venv\Lib\site-packages\ray\rllib\core\learner\learner_group.py", line 385, in _learner_update
result = _learner.update_from_episodes(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Z:\Thesis\Reinforcement Learning\venv\Lib\site-packages\ray\rllib\core\learner\learner.py", line 1086, in update_from_episodes
self._update_from_batch_or_episodes(
File "Z:\Thesis\Reinforcement Learning\venv\Lib\site-packages\ray\rllib\core\learner\learner.py", line 1362, in _update_from_batch_or_episodes
batch = self._learner_connector(
^^^^^^^^^^^^^^^^^^^^^^^^
File "Z:\Thesis\Reinforcement Learning\venv\Lib\site-packages\ray\rllib\connectors\learner\learner_connector_pipeline.py", line 38, in __call__
ret = super().__call__(
^^^^^^^^^^^^^^^^^
File "Z:\Thesis\Reinforcement Learning\venv\Lib\site-packages\ray\rllib\connectors\connector_pipeline_v2.py", line 111, in __call__
batch = connector(
^^^^^^^^^^
File "Z:\Thesis\Reinforcement Learning\venv\Lib\site-packages\ray\rllib\connectors\common\batch_individual_items.py", line 182, in __call__
else batch_fn(
^^^^^^^^^
File "Z:\Thesis\Reinforcement Learning\venv\Lib\site-packages\ray\rllib\utils\spaces\space_utils.py", line 378, in batch
ret = tree.map_structure(lambda *s: np_func(s, axis=0), *list_of_structs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Z:\Thesis\Reinforcement Learning\venv\Lib\site-packages\tree\__init__.py", line 429, in map_structure
[func(*args) for args in zip(*map(flatten, structures))])
^^^^^^^^^^^
File "Z:\Thesis\Reinforcement Learning\venv\Lib\site-packages\ray\rllib\utils\spaces\space_utils.py", line 378, in <lambda>
ret = tree.map_structure(lambda *s: np_func(s, axis=0), *list_of_structs)
^^^^^^^^^^^^^^^^^^
File "Z:\Thesis\Reinforcement Learning\venv\Lib\site-packages\numpy\_core\shape_base.py", line 449, in stack
raise ValueError('all input arrays must have the same shape')