ValueError: cannot reshape array of size while aligning memory

Hi, I am using ray to train a multi-agent environment, which contains tuple observation including map (2D image) and poistion (1D) adopting ComplexInputNet.py, with centralized critic approach.

I have set my config as follow

train_batch_size: 3000
rollout_fragment_length: 100
batch_mode: truncate_episodes

Then I ran into the following error, it seems the customized model is working, the agents did interact with the environment but I can’t figure out the reason causing this error.

(pid=16289) 2021-08-12 10:58:20,406	INFO trainer.py:698 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=16289) 2021-08-12 10:58:22,732	WARNING deprecation.py:34 -- DeprecationWarning: `simple_optimizer` has been deprecated. This will raise an error in the future!
(pid=16290) /Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/rllib/policy/sample_batch.py:105: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
(pid=16290)   self[k] = np.array(v)
2021-08-12 10:58:28,367	ERROR trial_runner.py:748 -- Trial CCPPOTrainer_coverage_25790_00000: Error processing event.
Traceback (most recent call last):
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 718, in _process_trial
    results = self.trial_executor.fetch_result(trial)
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 688, in fetch_result
    result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/worker.py", line 1495, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::CCPPOTrainer.train_buffered() (pid=16289, ip=10.161.213.106)
  File "python/ray/_raylet.pyx", line 501, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 451, in ray._raylet.execute_task.function_executor
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/_private/function_manager.py", line 563, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/tune/trainable.py", line 173, in train_buffered
    result = self.train()
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 608, in train
    raise e
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 594, in train
    result = Trainable.train(self)
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/tune/trainable.py", line 232, in train
    result = self.step()
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 178, in step
    res = next(self.train_exec_impl)
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/util/iter.py", line 756, in __next__
    return next(self.built_iterator)
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/util/iter.py", line 843, in apply_filter
    for item in it:
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/util/iter.py", line 843, in apply_filter
    for item in it:
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  [Previous line repeated 1 more time]
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/util/iter.py", line 876, in apply_flatten
    for item in it:
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/util/iter.py", line 828, in add_wait_hooks
    item = next(it)
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/util/iter.py", line 791, in apply_foreach
    result = fn(item)
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/rllib/execution/rollout_ops.py", line 185, in __call__
    out = SampleBatch.concat_samples(self.buffer)
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/rllib/policy/sample_batch.py", line 152, in concat_samples
    time_major=concat_samples[0].time_major)
  File "/Users/liuyungkai/opt/anaconda3/envs/playground/lib/python3.7/site-packages/ray/rllib/utils/memory.py", line 68, in concat_aligned
    output = flat.reshape(new_shape)
ValueError: cannot reshape array of size 240600 into shape (3000,100)
1 Like

In what condition will trigger concat_aligned function in memory.py ? I found it success reshape for the first few times but then failed, could someone explain the mechanism behind the scene? Am I missing data?

Sorry… it’s just my cpu or ram is out of memory…

1 Like

Hey @kyle-playground , thanks for the question and - most of all - for posting its solution here as well :slight_smile:

sorry, that’s not the solution, I found out it’s just my personal mistake forgot to return tensor…