Export LSTM model with tensorflow : Placeholder issue

Hello everyone,

My question is related to this issue :

My goal is to train a model with rllib library, and export it to use it with tensorflow only.

My success : export and import

I’ve succeed to export the trained PPO policy (called “policy_agent”) through the following function : trainer.get_policy("policy_agent").export_model(path)
And then use it with the pseudo code :

tf1, tf, tfv = try_import_tf()

model = tf.saved_model.load(path)
signature = model.signatures['serving_default']
sess = tf1.Session()
sess.run(tf1.global_variables_initializer())

compute_model = self.model(
	is_training=tf.constant(np.array(0)),
	prev_action=tf.constant(np.array([self.prev_action])),
	prev_reward=tf.constant([reward]),
	timestep=tf.constant(np.array(timestep)),
	observations=tf.constant(np.array([observation]))
)
result_output = sess.run(compute_model)

And the action to perform are in the result_output variable. Probably the code can be improved, but still it is working.

My problem : LSTM

However, when I want to use the LSTM model (with “use_lstm : True” in the config), I’m a bit confused. I use the same pseudo code, at the exception of the placeholder format, because it is required to add ‘Placeholder:0’ and ‘Placeholder_1:0’ :

placeholders = {
	"is_training": tf.constant(np.array(0)),
	"observations": tf.constant(np.array([observation])),
	"seq_lens": tf.constant([20]),
	"prev_action": tf.constant(np.array([prev_action], dtype=np.int64)),
	"prev_reward": tf.constant([reward]),
	"timestep": tf.constant(np.array(timestep)),
	"policy_agent/Placeholder:0": tf.convert_to_tensor(init_state, np.float32),
	"policy_agent/Placeholder_1:0": tf.convert_to_tensor(init_state, np.float32)
}

print(self.model.inputs)

compute_model = self.model(**placeholders)

The result of the model inputs is :
[<tf.Tensor 'policy_agent/Placeholder:0' shape=(?, 256) dtype=float32>, <tf.Tensor 'policy_agent/Placeholder_1:0' shape=(?, 256) dtype=float32>, <tf.Tensor 'policy_agent/Placeholder_2:0' shape=(?, 2) dtype=int64>, <tf.Tensor 'policy_agent/is_training:0' shape=() dtype=bool>, ... ]

However, I don’t know what to set for the init_state variable (if they are suppose to be the same ?). I tried :
init_state = np.asarray([np.zeros([256], np.float32) for _ in range(1)])
with different value of range.

But got this error that I don’t understand really :

tensorflow.python.framework.errors_impl.InvalidArgumentError:   Max scatter index must be < array size (19 vs. 1)
	 [[{{node policy_agent/model_1/lstm/TensorArrayUnstack_1/TensorArrayScatter/TensorArrayScatterV3}}]]
	 [[StatefulPartitionedCall_1]]

Configuration

I’m using Ubuntu20.04, with the latest rllib wheel available for Linux Python 3.8

Is anyone succeed to export and run a LSTM model? Do you have any idea how to solve this issue?
Sorry to not propose a minimal functional code to experiment directly on it, but if really required, I can do it.

Thank you for your time, I do appreciate !

Answering to myself, if anyone has the same problem.
The init_state variable was correct, and can be written as follows:
init_state = np.zeros(shape=[1, 256], dtype=np.float32)

The problem was coming from the seq_lens variable. For my application, it seems to be “seq_lens” : tf.constant([1]), and not [20].

Good luck for your project guys.