Get_policy error when get an action from restored trained model- New API stack

Ali_Zargarian · April 11, 2025, 8:40am

…\anaconda3\envs\rllibpy311\Lib\site-packages\ray\rllib\algorithms\algorithm.py", line 2109, in get_policy
return self.env_runner.get_policy(policy_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: ‘SingleAgentEnvRunner’ object has no attribute ‘get_policy’

Hi,
This error blocks me for a while. As I searched here , there is no reliable answer up to now.
Is there any standard way(or example) to restore and get an action from a trained model??
this problem happens when I try to restore my model with PPO and DQN. any idea how it can be solved?

christina · April 11, 2025, 9:11pm

Hi there, for debugging purposes, what version of RLlib API are you running right now? And is there a way for me to reproduce this error (do you have a snippet of code to share)?

Ali_Zargarian · April 13, 2025, 5:19pm

Hi Christina,
I refer you to this issue, that is the same somehow and still unsolved after long time:

github.com/ray-project/ray

[RLlib] Attribute error when trying to compute action after training Multi Agent PPO with New API Stack

opened 11:58AM - 04 Apr 24 UTC

Dr-IceCream

bug P2 rllib

### What happened + What you expected to happen After training Multi Agent PPO …with new New API Stack under the guidance of [ how-to-use-the-new-api-stack](https://docs.ray.io/en/latest/rllib/rllib-new-api-stack.html#how-to-use-the-new-api-stack) I tried to compute actions: ``` saved_algorithm = Algorithm.from_checkpoint( checkpoint=algorithm_path, policy_ids={"controlled_vehicle_0", "controlled_vehicle_1"}, policy_mapping_fn=lambda agent_id, episode, **kwargs: f"controlled_vehicle_{agent_id}", ) print("saved_algorithm type:", type(saved_algorithm)) # Evaluate the model obs, info = env.reset() print("obs:", obs) actions = {} for agent_id, agent_obs in obs.items(): policy_id = f"controlled_vehicle_{agent_id}" action = saved_algorithm.get_policy(policy_id).compute_single_action(agent_obs) actions[agent_id] = action print("action", actions) ``` but I get the error message: > AttributeError: 'MultiAgentEnvRunner' object has no attribute 'get_policy' I also tried some other way like: `action = saved_algorithm.compute_single_action(agent_obs, policy_id)` but still get the same error message: AttributeError: 'MultiAgentEnvRunner' object has no attribute 'get_policy'. I have seen a similar issue in #40312, are these two issues the same? detailed error message are as follows: > Traceback (most recent call last): > File "test_evaluate.py", line 151, in <module> > evaluate_agent(saved_algorithm, env) > File "test_evaluate.py", line 112, in evaluate_agent > action = saved_algorithm.get_policy(policy_id).compute_single_action(agent_obs) > File "C:\Users\Ice Cream\miniconda3\envs\env_highway\lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span > return method(self, *_args, **_kwargs) > File "C:\Users\Ice Cream\miniconda3\envs\env_highway\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 2051, in get_policy > return self.workers.local_worker().get_policy(policy_id) > AttributeError: 'MultiAgentEnvRunner' object has no attribute 'get_policy' and before I call this method, I also printed the relevant info, this part looks normal: > saved_algorithm type: <class 'ray.rllib.algorithms.ppo.ppo.PPO'> > saved_algorithm.get_config() <ray.rllib.algorithms.ppo.ppo.PPOConfig object at 0x0000014B0E4D4370> through the code: ``` print("saved_algorithm type:", type(saved_algorithm)) print("saved_algorithm.get_config()",saved_algorithm.get_config()) ``` ### Versions / Dependencies Ray 2.10.0 Python 3.8.18 Windows11 ### Reproduction script the code used for training is as follows: ``` register_env("ray_dict_highway_env", create_env) config = ( PPOConfig().environment(env="ray_dict_highway_env") .experimental(_enable_new_api_stack=True) .rollouts(env_runner_cls=MultiAgentEnvRunner) .resources( num_learner_workers=0, num_gpus_per_learner_worker=1, num_cpus_for_local_worker=1, ) .training(model={"uses_new_env_runners": True}) .multi_agent( policies={ "controlled_vehicle_0", "controlled_vehicle_1" }, policy_mapping_fn=lambda agent_id, episode, **kwargs: f"controlled_vehicle_{agent_id}", ) .framework("torch") ) current_script_directory = os.path.dirname(os.path.abspath(__file__)) ray_result_path = os.path.join(current_script_directory, folder_path, "ray_results") tuner = tune.Tuner( "PPO", run_config=RunConfig( storage_path=ray_result_path, name="2-agent-PPO", stop={"timesteps_total": 5e5} ), param_space=config.to_dict() ) results = tuner.fit() ``` And the code for loading checkpoints: ``` algorithm_path = r"D:\DRL_Project\DRL_highway\experiments\hw-fast-ma-dict-v0_rllib-mappo\2024-04-01_01-28\ray_results\2-agent-PPO\PPO_ray_dict_highway_env_1c7ab_00000_0_2024-04-01_01-28-38\checkpoint_000000" saved_algorithm = Algorithm.from_checkpoint( checkpoint=algorithm_path, policy_ids={"controlled_vehicle_0", "controlled_vehicle_1"}, policy_mapping_fn=lambda agent_id, episode, **kwargs: f"controlled_vehicle_{agent_id}", ) ``` ### Issue Severity High: It blocks me from completing my task.

christina · April 14, 2025, 6:21pm

Hi Ali,
Thanks for posting the Github issue - let’s continue to track and discuss it there, in the meantime, if you have any more data about this issue I encourage you to post it in the Github issue link!
Christina

Ali_Zargarian · April 15, 2025, 6:36am

Hi Christina,
as you see there, there wasn’t any reliable solution from April 2024 from Ray team.
Thanks

Mike2 · April 20, 2025, 6:55am

Hello Cristina,
Is it possible to increase the priority of issue mentioned by Ali_Zargarian up to P0? Or provide an official workaround for the correct model/policy procedure of loading it from the checkpoint?
Thanks in advance.

Mike2 · April 20, 2025, 7:06am

Hello Christina,
some additional information from my side:

ray 2.44.1
new version of RLlib API
checkpoint that I’m using for restoring model created by tune.Tuner

Please let me know if you need any additional information.
Thanks in advance.

christina · April 21, 2025, 11:59pm

Hi Mike, thank you for the additional info! I’ll go discuss with the team and see if there’s anything that I can do re: the Github issue.

Mike2 · April 22, 2025, 6:38am

Hello Cristina, I made a deep dive into source code. Please update documentation. The Github issue stays in the area of deprecated code (old API). You also have outdated documentation.

Here is example of the correct approach for new API.

Example. How to restore rl_model with weights. IMPALA class can be replaced with other Algorithm class.

model = IMPALA.from_checkpoint(self.checkpoint_path)
rl_module = model.get_module()

Example. How to get action for Discrete action space

rl_module = self.rl_module
fwd_ins = {"obs": torch.Tensor(observation).unsqueeze(0)}
fwd_outputs = rl_module.forward_inference(fwd_ins)
action_dist_class = rl_module.get_inference_action_dist_cls()
action_dist = action_dist_class.from_logits(
      fwd_outputs["action_dist_inputs"]
)
action = action_dist.sample()[0].numpy()

Processing of fwd_outputs should be appropriate for different type of action space, as well as nn output layer.

Ali_Zargarian · April 22, 2025, 11:21am

Hello @Mike2
thanks for sharing infos.
How is the results from your restored trained model? satisfactory?

Mike2 · April 22, 2025, 12:30pm

Hello @Ali_Zargarian.
The results are good.

I encountered an action prediction accuracy issue during training and after restoring the model from a checkpoint some time ago. The problem was related to data preprocessing (normalization). Once I implemented my own Scaler for observations preprocessing, the issue was resolved.

Ali_Zargarian · April 22, 2025, 12:51pm

@Mike2
Many Thanks for your rapid reply

Mike2 · April 22, 2025, 1:01pm

Hello @Ali_Zargarian,
I’m happy to help you, but I have a really busy schedule. Sorry for that.

Please take a look on the response from simonsays1980 in the original issue on the Github.

simonsays1980: The important part is now: RLlib uses pre- and post-processing of data. For the pre-processing (converting for example to a multi-agent batch and applying observation filters (like e.g. MeanStdFilter) the EnvToModulePipeline is used and for the post-processing the ModuleToEnvPipeline is used.

I guess it will help you a lot to make your custom environment more predictable.

@christina, thank you for your assistance.

Topic		Replies	Views
How to get and use a trained policy RLlib	0	412	September 8, 2024
AttributeError: 'SingleAgentEnvRunner' object has no attribute 'get_policy'	0	44	April 15, 2025
How to compute actions with RLlib and Tune after training RLlib	3	413	September 21, 2024
How to deploy a trained Ray RLlib PPO policy/model in multi-agent-case? RLlib	5	808	November 10, 2021
Policy.compute_single_action() returns "'list' object has no attribute 'float'" error RLlib	1	387	May 24, 2023

Get_policy error when get an action from restored trained model- New API stack

Related topics