How to compute actions with RLlib and Tune after training

DreamerJ1 · September 3, 2024, 9:34am

Can anyone actually see the post?

DreamerJ1 · September 3, 2024, 9:36am

ok so since the post doesnt look like its showing here it is again:
Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi So I run into a problem where I am unable to work out how to compute actions after training. What I want to do is train a model using PPO and then run through my environment again, but this time rendering each step so that I can see what is happening and be able to give it data from outside the environment and have it make decisions based on those observations. Below is the configuration and tune that I am using, as well as where I compute the best trials checkpoint. The problem is that when I run the code I get:

Traceback (most recent call last):
  File "src\Control.py", line 199, in <module>
    action = agent.compute_single_action(state)
  File "\anaconda3\envs\Poker_Bot\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 1787, in compute_single_action
    policy = self.get_policy(policy_id)
  File "\anaconda3\envs\Poker_Bot\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 2037, in get_policy
    return self.workers.local_worker().get_policy(policy_id)
AttributeError: 'SingleAgentEnvRunner' object has no attribute 'get_policy'

It feels like it should be such an easy thing to do but I’ve been broken by this xD. Alongside this problem, I am also curious if I have set this configuration up correctly as I want my custom model to be the one to make the decisions regarding what action to take given an observation

# create the configuration
    config =
        PPOConfig()
        .api_stack(
            enable_rl_module_and_learner=True,
            enable_env_runner_and_connector_v2=True,
        )
        .environment(env_name)
        .framework("torch")
        .env_runners(
            num_env_runners=1,
            num_envs_per_env_runner=1,
            num_gpus_per_env_runner=1,
            # env_to_module_connector=_env_to_module,
        )
        .resources(
            num_cpus_for_main_process=1,
        )
        .rl_module(
            model_config_dict={
                "custom_model": "small_testing_agent",
            },
            rl_module_spec=SingleAgentRLModuleSpec(),
        )
    )
    
    # train the agent
    analysis = tune.run(
        "PPO",
        config=config.to_dict(),
        stop={"training_iteration": 2},  # stop conditions
        checkpoint_at_end=True, # save checkpoint at end
    )

    # get best trial
    best_trial = analysis.get_best_trial("episode_reward_mean", mode="max")

    # have it do one run-through of the environment and then render it
    agent = PPO(config=config.to_dict(), env=env_name)
    agent.restore(best_trial.checkpoint)

    # run the agent
    state = env.reset()
    done = False
    while not done:
        action = agent.compute_single_action(state)
        print("Taking action: ", action)
        state, reward, terminated, truncateds, info = env.step(action)
        env.render()
        print(f"Action: {action}, Reward: {reward}")

mchomlin · September 21, 2024, 1:58am

I encountered the same issue. You can see my post on it here. Unfortunately, I don’t have a solution yet. Hopefully this will get attention soon - it seems like such a basic capability.

MCW_Lad · July 14, 2025, 10:42pm

Yeah, I’ve run into that error message early on with RLlib as well. The official way of loading something from a checkpoint for inference can be found in the /examples section of RLlib on Github (and I’ve got example code here, if you’d like to use it - the ObsVectorizationWrapper is a workaround for a crash associated with Repeated observation spaces).

Basically, what’s going wrong in your code is that you’re querying the environment runner (that collects data for training the agent) instead of the agent itself.

PhilippWillms · July 15, 2025, 6:40pm

@MCW_Lad, @DreamerJ1, @mchomlin : you are all using the new API stack right? which ray version are you using?

MCW_Lad · July 15, 2025, 8:41pm

Yep, new API stack. I’m using the latest version of Ray. The example code I wrote should work out of the box on the environment in the same repo, and the files in rllib/examples/inference are also pretty helpful for what you’re trying to do.

Topic		Replies	Views
Attribute error when trying to compute actions after training DreamerV3 on Cartpole RLlib	2	468	December 4, 2023
Compute_actions for Trajectory API RLlib	11	2419	February 10, 2022
Compute/display actions from ray.tune RLlib	10	1681	March 30, 2021
Compute actions Programmatically RLlib	1	285	February 5, 2022
Get_policy error when get an action from restored trained model- New API stack	12	104	April 22, 2025

How to compute actions with RLlib and Tune after training

Related topics