Get_policy error when get an action from restored trained model- New API stack

…\anaconda3\envs\rllibpy311\Lib\site-packages\ray\rllib\algorithms\algorithm.py", line 2109, in get_policy
return self.env_runner.get_policy(policy_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: ‘SingleAgentEnvRunner’ object has no attribute ‘get_policy’

Hi,
This error blocks me for a while. As I searched here , there is no reliable answer up to now.
Is there any standard way(or example) to restore and get an action from a trained model??
this problem happens when I try to restore my model with PPO and DQN. any idea how it can be solved?

Hi there, for debugging purposes, what version of RLlib API are you running right now? And is there a way for me to reproduce this error (do you have a snippet of code to share)?

Hi Christina,
I refer you to this issue, that is the same somehow and still unsolved after long time:

Hi Ali,
Thanks for posting the Github issue - let’s continue to track and discuss it there, in the meantime, if you have any more data about this issue I encourage you to post it in the Github issue link!
Christina

Hi Christina,
as you see there, there wasn’t any reliable solution from April 2024 from Ray team.
Thanks

Hello Cristina,
Is it possible to increase the priority of issue mentioned by Ali_Zargarian up to P0? Or provide an official workaround for the correct model/policy procedure of loading it from the checkpoint?
Thanks in advance.

1 Like

Hello Christina,
some additional information from my side:

  • ray 2.44.1
  • new version of RLlib API
  • checkpoint that I’m using for restoring model created by tune.Tuner

Please let me know if you need any additional information.
Thanks in advance.

Hi Mike, thank you for the additional info! I’ll go discuss with the team and see if there’s anything that I can do re: the Github issue.

Hello Cristina, I made a deep dive into source code. Please update documentation. The Github issue stays in the area of deprecated code (old API). You also have outdated documentation.

Here is example of the correct approach for new API.

Example. How to restore rl_model with weights. IMPALA class can be replaced with other Algorithm class.

model = IMPALA.from_checkpoint(self.checkpoint_path)
rl_module = model.get_module()

Example. How to get action for Discrete action space

rl_module = self.rl_module
fwd_ins = {"obs": torch.Tensor(observation).unsqueeze(0)}
fwd_outputs = rl_module.forward_inference(fwd_ins)
action_dist_class = rl_module.get_inference_action_dist_cls()
action_dist = action_dist_class.from_logits(
      fwd_outputs["action_dist_inputs"]
)
action = action_dist.sample()[0].numpy()

Processing of fwd_outputs should be appropriate for different type of action space, as well as nn output layer.

Hello @Mike2
thanks for sharing infos.
How is the results from your restored trained model? satisfactory?

Hello @Ali_Zargarian.
The results are good.

I encountered an action prediction accuracy issue during training and after restoring the model from a checkpoint some time ago. The problem was related to data preprocessing (normalization). Once I implemented my own Scaler for observations preprocessing, the issue was resolved.

@Mike2
Many Thanks for your rapid reply

Hello @Ali_Zargarian,
I’m happy to help you, but I have a really busy schedule. Sorry for that.

Please take a look on the response from simonsays1980 in the original issue on the Github.

simonsays1980: The important part is now: RLlib uses pre- and post-processing of data. For the pre-processing (converting for example to a multi-agent batch and applying observation filters (like e.g. MeanStdFilter) the EnvToModulePipeline is used and for the post-processing the ModuleToEnvPipeline is used.

I guess it will help you a lot to make your custom environment more predictable.

@christina, thank you for your assistance.

1 Like