[Rllib] compute_single_action() with an LSTM-PPO trainer fails

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Python 3.10.8
ray==2.2.0
tensorflow==2.11.0

I am trying to test my trained agent using compute single action. The PPO default model with an LSTM wrapper: max_sequence_length = 10, cell_size = 128. I am using “tf” as framework. The same PPO trainer without LSTM works fine.

trainer = trainer.restore(checkpoint) → works!

trainer.train() → works

trainer.compute_single_action(obs) → Leads to an error:

Error Message:

2023-02-03 13:15:49,103 ERROR tf_run_builder.py:50 – Error fetching: [<tf.Tensor ‘default_policy/cond_1/Merge:0’ shape=(?,) dtype=int64>, <tf.Tensor ‘default_policy/model_1/lstm/while/Exit_3:0’ shape=(?, 128) dtype=float32>, <tf.Tensor ‘default_policy/model_1/lstm/while/Exit_4:0’ shape=(?, 128) dtype=float32>, {‘action_prob’: <tf.Tensor ‘default_policy/Exp:0’ shape=(?,) dtype=float32>, ‘action_logp’: <tf.Tensor ‘default_policy/cond_2/Merge:0’ shape=(?,) dtype=float32>, ‘action_dist_inputs’: <tf.Tensor ‘default_policy/Reshape_1:0’ shape=(?, 12) dtype=float32>, ‘vf_preds’: <tf.Tensor ‘default_policy/Reshape_2:0’ shape=(?,) dtype=float32>}], feed_dict={<tf.Tensor ‘default_policy/obs:0’ shape=(?, 40) dtype=float32>: array([[4.97777018e-01, 2.29558937e-02, 1.00000000e+00, 4.97349521e-01,
1.52695383e-01, 4.19409695e-01, 4.96922025e-01, 2.58189322e-01,
2.04762132e-01, 4.96494528e-01, 2.63863812e-01, 2.46728897e-01,
4.96067031e-01, 1.77714728e-01, 3.25536652e-01, 4.98632011e-01,
2.55352076e-02, 1.00000000e+00, 4.99059508e-01, 2.91462471e-02,
4.44290551e-01, 4.99487004e-01, 1.55016766e-01, 4.97798038e-01,
4.99914501e-01, 1.94480268e-01, 3.41411342e-01, 5.00341997e-01,
2.42971370e-01, 2.72856084e-01, 1.00000000e+00, 3.48250044e-02,
4.98204514e-01, 4.73000000e-01, 4.66356019e-01, 4.33725122e-03,
1.46148357e-02, 1.00000000e+00, 1.00000000e-17, 1.00000000e-17]]), <tf.Tensor ‘default_policy/is_exploring:0’ shape=() dtype=bool>: False, <tf.Tensor ‘default_policy/timestep:0’ shape=() dtype=int64>: 2717636}
Traceback (most recent call last):
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/client/session.py”, line 1378, in _do_call
return fn(*args)
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/client/session.py”, line 1361, in _run_fn
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/client/session.py”, line 1454, in _call_tf_sessionrun
return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) INVALID_ARGUMENT: You must feed a value for placeholder tensor ‘default_policy/seq_lens’ with dtype int32 and shape [?]
[[{{node default_policy/seq_lens}}]]
[[default_policy/model_1/lstm/TensorArrayUnstack/strided_slice/_529]]
(1) INVALID_ARGUMENT: You must feed a value for placeholder tensor ‘default_policy/seq_lens’ with dtype int32 and shape [?]
[[{{node default_policy/seq_lens}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/utils/tf_run_builder.py”, line 42, in get
self._executed = _run_timeline(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/utils/tf_run_builder.py”, line 102, in _run_timeline
fetches = sess.run(ops, feed_dict=feed_dict)
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/client/session.py”, line 968, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/client/session.py”, line 1191, in _run
results = self._do_run(handle, final_targets, final_fetches,
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/client/session.py”, line 1371, in _do_run
return self._do_call(_run_fn, feeds, fetches, targets, options,
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/client/session.py”, line 1397, in _do_call
raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

Detected at node ‘default_policy/seq_lens’ defined at (most recent call last):
File “/opt/conda/envs/tfenv/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/opt/conda/envs/tfenv/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/home/jovyan/Level-3-Backtest-Engine-RL/reinforcement_learning/test_loops/solve_lstm.py”, line 147, in
trained_strategy = PPOTrainer(config=base_config)
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py”, line 441, in init
super().init(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py”, line 169, in init
self.setup(copy.deepcopy(self.config))
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py”, line 566, in setup
self.workers = WorkerSet(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py”, line 169, in init
self._setup(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py”, line 259, in _setup
self._local_worker = self._make_worker(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py”, line 941, in _make_worker
worker = cls(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py”, line 712, in init
self._build_policy_map(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py”, line 1970, in _build_policy_map
self.policy_map.create_policy(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/policy/policy_map.py”, line 146, in create_policy
policy = create_policy_for_framework(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/utils/policy.py”, line 117, in create_policy_for_framework
return policy_class(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo_tf_policy.py”, line 83, in init
base.init(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/policy/dynamic_tf_policy_v2.py”, line 87, in init
self._init_state_inputs(existing_inputs)
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/policy/dynamic_tf_policy_v2.py”, line 450, in _init_state_inputs
self._seq_lens = tf1.placeholder(
Node: ‘default_policy/seq_lens’
Detected at node ‘default_policy/seq_lens’ defined at (most recent call last):
File “/opt/conda/envs/tfenv/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/opt/conda/envs/tfenv/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/home/jovyan/Level-3-Backtest-Engine-RL/reinforcement_learning/test_loops/solve_lstm.py”, line 147, in
trained_strategy = PPOTrainer(config=base_config)
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py”, line 441, in init
super().init(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py”, line 169, in init
self.setup(copy.deepcopy(self.config))
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py”, line 566, in setup
self.workers = WorkerSet(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py”, line 169, in init
self._setup(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py”, line 259, in _setup
self._local_worker = self._make_worker(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py”, line 941, in _make_worker
worker = cls(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py”, line 712, in init
self._build_policy_map(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py”, line 1970, in _build_policy_map
self.policy_map.create_policy(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/policy/policy_map.py”, line 146, in create_policy
policy = create_policy_for_framework(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/utils/policy.py”, line 117, in create_policy_for_framework
return policy_class(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo_tf_policy.py”, line 83, in init
base.init(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/policy/dynamic_tf_policy_v2.py”, line 87, in init
self._init_state_inputs(existing_inputs)
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/policy/dynamic_tf_policy_v2.py”, line 450, in _init_state_inputs
self._seq_lens = tf1.placeholder(
Node: ‘default_policy/seq_lens’
2 root error(s) found.
(0) INVALID_ARGUMENT: You must feed a value for placeholder tensor ‘default_policy/seq_lens’ with dtype int32 and shape [?]
[[{{node default_policy/seq_lens}}]]
[[default_policy/model_1/lstm/TensorArrayUnstack/strided_slice/_529]]
(1) INVALID_ARGUMENT: You must feed a value for placeholder tensor ‘default_policy/seq_lens’ with dtype int32 and shape [?]
[[{{node default_policy/seq_lens}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for ‘default_policy/seq_lens’:
File “/opt/conda/envs/tfenv/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/opt/conda/envs/tfenv/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/home/jovyan/Level-3-Backtest-Engine-RL/reinforcement_learning/test_loops/solve_lstm.py”, line 147, in
trained_strategy = PPOTrainer(config=base_config)
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py”, line 441, in init
super().init(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py”, line 169, in init
self.setup(copy.deepcopy(self.config))
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py”, line 566, in setup
self.workers = WorkerSet(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py”, line 169, in init
self._setup(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py”, line 259, in _setup
self._local_worker = self._make_worker(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py”, line 941, in _make_worker
worker = cls(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py”, line 712, in init
self._build_policy_map(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py”, line 1970, in _build_policy_map
self.policy_map.create_policy(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/policy/policy_map.py”, line 146, in create_policy
policy = create_policy_for_framework(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/utils/policy.py”, line 117, in create_policy_for_framework
return policy_class(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo_tf_policy.py”, line 83, in init
base.init(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/policy/dynamic_tf_policy_v2.py”, line 87, in init
self._init_state_inputs(existing_inputs)
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/policy/dynamic_tf_policy_v2.py”, line 450, in _init_state_inputs
self._seq_lens = tf1.placeholder(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/ops/array_ops.py”, line 3343, in placeholder
return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/ops/gen_array_ops.py”, line 6898, in placeholder
_, _, _op, _outputs = _op_def_library._apply_op_helper(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/framework/op_def_library.py”, line 795, in _apply_op_helper
op = g._create_op_internal(op_type_name, inputs, dtypes=None,
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/framework/ops.py”, line 3798, in _create_op_internal
ret = Operation(

Traceback (most recent call last):
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/client/session.py”, line 1378, in _do_call
return fn(*args)
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/client/session.py”, line 1361, in _run_fn
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/client/session.py”, line 1454, in _call_tf_sessionrun
return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) INVALID_ARGUMENT: You must feed a value for placeholder tensor ‘default_policy/seq_lens’ with dtype int32 and shape [?]
[[{{node default_policy/seq_lens}}]]
[[default_policy/model_1/lstm/TensorArrayUnstack/strided_slice/_529]]
(1) INVALID_ARGUMENT: You must feed a value for placeholder tensor ‘default_policy/seq_lens’ with dtype int32 and shape [?]
[[{{node default_policy/seq_lens}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/opt/conda/envs/tfenv/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/opt/conda/envs/tfenv/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/home/jovyan/Level-3-Backtest-Engine-RL/reinforcement_learning/test_loops/solve_lstm.py”, line 186, in
action = trained_strategy.compute_single_action(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py”, line 1495, in compute_single_action
action, state, extra = policy.compute_single_action(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/policy/policy.py”, line 466, in compute_single_action
out = self.compute_actions_from_input_dict(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/policy/tf_policy.py”, line 326, in compute_actions_from_input_dict
fetched = builder.get(to_fetch)
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/utils/tf_run_builder.py”, line 55, in get
raise e
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/utils/tf_run_builder.py”, line 42, in get
self._executed = _run_timeline(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/utils/tf_run_builder.py”, line 102, in _run_timeline
fetches = sess.run(ops, feed_dict=feed_dict)
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/client/session.py”, line 968, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/client/session.py”, line 1191, in _run
results = self._do_run(handle, final_targets, final_fetches,
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/client/session.py”, line 1371, in _do_run
return self._do_call(_run_fn, feeds, fetches, targets, options,
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/client/session.py”, line 1397, in _do_call
raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

Detected at node ‘default_policy/seq_lens’ defined at (most recent call last):
File “/opt/conda/envs/tfenv/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/opt/conda/envs/tfenv/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/home/jovyan/Level-3-Backtest-Engine-RL/reinforcement_learning/test_loops/solve_lstm.py”, line 147, in
trained_strategy = PPOTrainer(config=base_config)
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py”, line 441, in init
super().init(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py”, line 169, in init
self.setup(copy.deepcopy(self.config))
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py”, line 566, in setup
self.workers = WorkerSet(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py”, line 169, in init
self._setup(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py”, line 259, in _setup
self._local_worker = self._make_worker(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py”, line 941, in _make_worker
worker = cls(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py”, line 712, in init
self._build_policy_map(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py”, line 1970, in _build_policy_map
self.policy_map.create_policy(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/policy/policy_map.py”, line 146, in create_policy
policy = create_policy_for_framework(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/utils/policy.py”, line 117, in create_policy_for_framework
return policy_class(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo_tf_policy.py”, line 83, in init
base.init(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/policy/dynamic_tf_policy_v2.py”, line 87, in init
self._init_state_inputs(existing_inputs)
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/policy/dynamic_tf_policy_v2.py”, line 450, in _init_state_inputs
self._seq_lens = tf1.placeholder(
Node: ‘default_policy/seq_lens’
Detected at node ‘default_policy/seq_lens’ defined at (most recent call last):
File “/opt/conda/envs/tfenv/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/opt/conda/envs/tfenv/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/home/jovyan/Level-3-Backtest-Engine-RL/reinforcement_learning/test_loops/solve_lstm.py”, line 147, in
trained_strategy = PPOTrainer(config=base_config)
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py”, line 441, in init
super().init(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py”, line 169, in init
self.setup(copy.deepcopy(self.config))
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py”, line 566, in setup
self.workers = WorkerSet(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py”, line 169, in init
self._setup(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py”, line 259, in _setup
self._local_worker = self._make_worker(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py”, line 941, in _make_worker
worker = cls(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py”, line 712, in init
self._build_policy_map(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py”, line 1970, in _build_policy_map
self.policy_map.create_policy(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/policy/policy_map.py”, line 146, in create_policy
policy = create_policy_for_framework(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/utils/policy.py”, line 117, in create_policy_for_framework
return policy_class(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo_tf_policy.py”, line 83, in init
base.init(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/policy/dynamic_tf_policy_v2.py”, line 87, in init
self._init_state_inputs(existing_inputs)
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/policy/dynamic_tf_policy_v2.py”, line 450, in _init_state_inputs
self._seq_lens = tf1.placeholder(
Node: ‘default_policy/seq_lens’
2 root error(s) found.
(0) INVALID_ARGUMENT: You must feed a value for placeholder tensor ‘default_policy/seq_lens’ with dtype int32 and shape [?]
[[{{node default_policy/seq_lens}}]]
[[default_policy/model_1/lstm/TensorArrayUnstack/strided_slice/_529]]
(1) INVALID_ARGUMENT: You must feed a value for placeholder tensor ‘default_policy/seq_lens’ with dtype int32 and shape [?]
[[{{node default_policy/seq_lens}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for ‘default_policy/seq_lens’:
File “/opt/conda/envs/tfenv/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/opt/conda/envs/tfenv/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/home/jovyan/Level-3-Backtest-Engine-RL/reinforcement_learning/test_loops/solve_lstm.py”, line 147, in
trained_strategy = PPOTrainer(config=base_config)
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py”, line 441, in init
super().init(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py”, line 169, in init
self.setup(copy.deepcopy(self.config))
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py”, line 566, in setup
self.workers = WorkerSet(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py”, line 169, in init
self._setup(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py”, line 259, in _setup
self._local_worker = self._make_worker(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py”, line 941, in _make_worker
worker = cls(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py”, line 712, in init
self._build_policy_map(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py”, line 1970, in _build_policy_map
self.policy_map.create_policy(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/policy/policy_map.py”, line 146, in create_policy
policy = create_policy_for_framework(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/utils/policy.py”, line 117, in create_policy_for_framework
return policy_class(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo_tf_policy.py”, line 83, in init
base.init(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/policy/dynamic_tf_policy_v2.py”, line 87, in init
self._init_state_inputs(existing_inputs)
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/ray/rllib/policy/dynamic_tf_policy_v2.py”, line 450, in _init_state_inputs
self._seq_lens = tf1.placeholder(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/ops/array_ops.py”, line 3343, in placeholder
return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/ops/gen_array_ops.py”, line 6898, in placeholder
_, _, _op, _outputs = _op_def_library._apply_op_helper(
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/framework/op_def_library.py”, line 795, in _apply_op_helper
op = g._create_op_internal(op_type_name, inputs, dtypes=None,
File “/opt/conda/envs/tfenv/lib/python3.10/site-packages/tensorflow/python/framework/ops.py”, line 3798, in _create_op_internal
ret = Operation(

Hi @Mirakolix_Gallier,

Try this example:

1 Like