I use trainer.get_policy().export_model() export rllib CHECKPOINT model into TFModel(pd files). The export process works well. But when i want to use the TFModel do inference, i find the observation requires A Tensor Type and my input_dict is a Dict with action mask. So, should i convert my dict inputs into StructuredTensor?
Great question! For the inference, are you using Trainer.compute_actions, Policy.compute_actions, or Policy.compute_single_action, or Trainer.compute_action? Yes, we need to clean this up a little and make it more intuitive. The Trainer methods will actually accept an observation from the environment, then preprocess it (e.g. flatten the dict). The Policy methods require an already preprocessed input.
But when I export the TF Model from PPO.Trainer(got pd file), the Policy Methods require the preprocessed input. And I don’t know how to convert A Dict input into A Flattened input.
So, can RLlib provide a function do this preprocess(flatten the dict). Or if you can tell me the details of this flatten operation, maybe I can implement myself.
Thanks you guys for achieving such an excellent project!
Hey @hybug.
Yes, the Policy object always requires the already flattened obs as inputs into its compute_action methods.
The Trainer will do the preprocessing itself, so you could simply use Trainer.compute_action or Trainer.compute_actions and pass in the Dict observation.
I fixed the issue. The bug was that the spaces for the local worker were not translated into their original forms, so the local worker did not create a proper preprocessor to be used with Trainer.compute_action()