Hello everyone,
My question is related to this issue :
My goal is to train a model with rllib library, and export it to use it with tensorflow only.
My success : export and import
I’ve succeed to export the trained PPO policy (called “policy_agent”) through the following function : trainer.get_policy("policy_agent").export_model(path)
And then use it with the pseudo code :
tf1, tf, tfv = try_import_tf()
model = tf.saved_model.load(path)
signature = model.signatures['serving_default']
sess = tf1.Session()
sess.run(tf1.global_variables_initializer())
compute_model = self.model(
is_training=tf.constant(np.array(0)),
prev_action=tf.constant(np.array([self.prev_action])),
prev_reward=tf.constant([reward]),
timestep=tf.constant(np.array(timestep)),
observations=tf.constant(np.array([observation]))
)
result_output = sess.run(compute_model)
And the action to perform are in the result_output variable. Probably the code can be improved, but still it is working.
My problem : LSTM
However, when I want to use the LSTM model (with “use_lstm : True” in the config), I’m a bit confused. I use the same pseudo code, at the exception of the placeholder format, because it is required to add ‘Placeholder:0’ and ‘Placeholder_1:0’ :
placeholders = {
"is_training": tf.constant(np.array(0)),
"observations": tf.constant(np.array([observation])),
"seq_lens": tf.constant([20]),
"prev_action": tf.constant(np.array([prev_action], dtype=np.int64)),
"prev_reward": tf.constant([reward]),
"timestep": tf.constant(np.array(timestep)),
"policy_agent/Placeholder:0": tf.convert_to_tensor(init_state, np.float32),
"policy_agent/Placeholder_1:0": tf.convert_to_tensor(init_state, np.float32)
}
print(self.model.inputs)
compute_model = self.model(**placeholders)
The result of the model inputs is :
[<tf.Tensor 'policy_agent/Placeholder:0' shape=(?, 256) dtype=float32>, <tf.Tensor 'policy_agent/Placeholder_1:0' shape=(?, 256) dtype=float32>, <tf.Tensor 'policy_agent/Placeholder_2:0' shape=(?, 2) dtype=int64>, <tf.Tensor 'policy_agent/is_training:0' shape=() dtype=bool>, ... ]
However, I don’t know what to set for the init_state variable (if they are suppose to be the same ?). I tried :
init_state = np.asarray([np.zeros([256], np.float32) for _ in range(1)])
with different value of range.
But got this error that I don’t understand really :
tensorflow.python.framework.errors_impl.InvalidArgumentError: Max scatter index must be < array size (19 vs. 1)
[[{{node policy_agent/model_1/lstm/TensorArrayUnstack_1/TensorArrayScatter/TensorArrayScatterV3}}]]
[[StatefulPartitionedCall_1]]
Configuration
I’m using Ubuntu20.04, with the latest rllib wheel available for Linux Python 3.8
Is anyone succeed to export and run a LSTM model? Do you have any idea how to solve this issue?
Sorry to not propose a minimal functional code to experiment directly on it, but if really required, I can do it.
Thank you for your time, I do appreciate !