Export LSTM model with tensorflow : Placeholder issue

KiloNovemberDelta · July 16, 2021, 1:40pm

Hello everyone,

My question is related to this issue :

github.com/ray-project/ray

[rllib] Invalid input/output signatures in tf.saved_model from TFPolicy.export_model

opened 08:04AM - 11 Sep 20 UTC

neigh80

P2 bug

### What is the problem? *Ray version and other system information (Python ve…rsion, TensorFlow version, OS):* * Ray 0.8.7 * TensorFlow 2.1.1 I'd like to create ```tf.saved_model``` using a rllib checkpoint. ```TFPolicy.export_model``` works fine, but some input/output signatures are represented like ```default_policy/Placeholder:0```. These signatures are used for input/output to manage LSTM states. Since calling ```tf.saved_model``` with positional arguments is not allowed, it is impossible to use this ```tf.saved_model```. ### Reproduction (REQUIRED) #### Script ```python import ray from ray.rllib.agents.ppo import PPOTrainer from ray.rllib.examples.env.stateless_cartpole import StatelessCartPole config = { 'num_gpus': 1, "env": StatelessCartPole, "model": { "use_lstm": True, "lstm_use_prev_action_reward": False, }, "framework": "tf", } ray.init() agent = PPOTrainer(config) agent.restore(CARTPOLE_LSTM_CHECKPOINT_PATH) policy = agent.workers.local_worker().policy_map['default_policy'] policy.export_model(TF_SAVED_MODEL_PATH) import tensorflow as tf m = tf.saved_model.load(TF_SAVED_MODEL_PATH).signatures['serving_default'] print(f'tf.saved_model: {m}') print(f'structured_outputs: {m.structured_outputs}') ``` #### Output ``` ... 2020-09-11 16:32:49,464 INFO trainable.py:473 -- Restored on 192.168.50.11 from checkpoint: /home/neigh/ray_results/PPO/PPO_StatelessCartPole_0_2020-09-11_15-20-57k2x73u83/checkpoint_1/checkpoint-1 2020-09-11 16:32:49,464 INFO trainable.py:480 -- Current state after restoring: {'_iteration': 1, '_timesteps_total': None, '_time_total': 4.88869309425354, '_episodes_total': 176} ... tf.saved_model: <ConcreteFunction pruned(prev_action, is_training, observations, prev_reward, default_policy/Placeholder:0, default_policy/Placeholder_1:0, seq_lens) at 0x7FD9E409CE10> structured_outputs: {'action_prob': <tf.Tensor 'default_policy/Exp:0' shape=(?,) dtype=float32>, 'default_policy/functional_3/lstm/while/Exit_4:0': <tf.Tensor 'default_policy/functional_3/lstm/while/Exit_4:0' shape=(?, 256) dtype=float32>, 'vf_preds': <tf.Tensor 'default_policy/Reshape_2:0' shape=(?,) dtype=float32>, 'action_logp': <tf.Tensor 'default_policy/cond_1/Merge:0' shape=(?,) dtype=float32>, 'actions_0': <tf.Tensor 'default_policy/cond/Merge:0' shape=(?,) dtype=int64>, 'action_dist_inputs': <tf.Tensor 'default_policy/Reshape_1:0' shape=(?, 2) dtype=float32>, 'default_policy/functional_3/lstm/while/Exit_3:0': <tf.Tensor 'default_policy/functional_3/lstm/while/Exit_3:0' shape=(?, 256) dtype=float32>} ``` - [x] I have verified my script runs in a clean environment and reproduces the issue. - [ ] I have verified the issue also occurs with the [latest wheels](https://docs.ray.io/en/latest/installation.html).

My goal is to train a model with rllib library, and export it to use it with tensorflow only.

My success : export and import

I’ve succeed to export the trained PPO policy (called “policy_agent”) through the following function : trainer.get_policy("policy_agent").export_model(path)
And then use it with the pseudo code :

tf1, tf, tfv = try_import_tf()

model = tf.saved_model.load(path)
signature = model.signatures['serving_default']
sess = tf1.Session()
sess.run(tf1.global_variables_initializer())

compute_model = self.model(
	is_training=tf.constant(np.array(0)),
	prev_action=tf.constant(np.array([self.prev_action])),
	prev_reward=tf.constant([reward]),
	timestep=tf.constant(np.array(timestep)),
	observations=tf.constant(np.array([observation]))
)
result_output = sess.run(compute_model)

And the action to perform are in the result_output variable. Probably the code can be improved, but still it is working.

My problem : LSTM

However, when I want to use the LSTM model (with “use_lstm : True” in the config), I’m a bit confused. I use the same pseudo code, at the exception of the placeholder format, because it is required to add ‘Placeholder:0’ and ‘Placeholder_1:0’ :

placeholders = {
	"is_training": tf.constant(np.array(0)),
	"observations": tf.constant(np.array([observation])),
	"seq_lens": tf.constant([20]),
	"prev_action": tf.constant(np.array([prev_action], dtype=np.int64)),
	"prev_reward": tf.constant([reward]),
	"timestep": tf.constant(np.array(timestep)),
	"policy_agent/Placeholder:0": tf.convert_to_tensor(init_state, np.float32),
	"policy_agent/Placeholder_1:0": tf.convert_to_tensor(init_state, np.float32)
}

print(self.model.inputs)

compute_model = self.model(**placeholders)

The result of the model inputs is :
[<tf.Tensor 'policy_agent/Placeholder:0' shape=(?, 256) dtype=float32>, <tf.Tensor 'policy_agent/Placeholder_1:0' shape=(?, 256) dtype=float32>, <tf.Tensor 'policy_agent/Placeholder_2:0' shape=(?, 2) dtype=int64>, <tf.Tensor 'policy_agent/is_training:0' shape=() dtype=bool>, ... ]

However, I don’t know what to set for the init_state variable (if they are suppose to be the same ?). I tried :
init_state = np.asarray([np.zeros([256], np.float32) for _ in range(1)])
with different value of range.

But got this error that I don’t understand really :

tensorflow.python.framework.errors_impl.InvalidArgumentError:   Max scatter index must be < array size (19 vs. 1)
	 [[{{node policy_agent/model_1/lstm/TensorArrayUnstack_1/TensorArrayScatter/TensorArrayScatterV3}}]]
	 [[StatefulPartitionedCall_1]]

Configuration

I’m using Ubuntu20.04, with the latest rllib wheel available for Linux Python 3.8

Is anyone succeed to export and run a LSTM model? Do you have any idea how to solve this issue?
Sorry to not propose a minimal functional code to experiment directly on it, but if really required, I can do it.

Thank you for your time, I do appreciate !

KiloNovemberDelta · July 20, 2021, 2:53pm

Answering to myself, if anyone has the same problem.
The init_state variable was correct, and can be written as follows:
init_state = np.zeros(shape=[1, 256], dtype=np.float32)

The problem was coming from the seq_lens variable. For my application, it seems to be “seq_lens” : tf.constant([1]), and not [20].

Good luck for your project guys.

sven1977 · August 3, 2021, 7:31pm

Hey @KiloNovemberDelta , sorry for having missed your question and thanks for posting the answer here! This is great.

Topic		Replies	Views
[RLlib] Problem with TFModelV2 loading after having saved one with `TFPolicy.export_model()` RLlib	5	2627	February 10, 2021
[Rllib] compute_single_action() with an LSTM-PPO trainer fails RLlib	1	983	February 3, 2023
Feeding issue for timestep placeholder in Ray 1.0.1.post1 RLlib	7	1149	February 24, 2021
For exporting r2d2+lstm to onnx, why is empty state being passed in? RLlib	10	92	February 5, 2025
Issue with custom LSTMs RLlib	34	2159	February 26, 2023

Export LSTM model with tensorflow : Placeholder issue

Related topics