How to compute actions with RLlib and Tune after training

Can anyone actually see the post?

ok so since the post doesnt look like its showing here it is again:
Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi :slight_smile: So I run into a problem where I am unable to work out how to compute actions after training. What I want to do is train a model using PPO and then run through my environment again, but this time rendering each step so that I can see what is happening and be able to give it data from outside the environment and have it make decisions based on those observations. Below is the configuration and tune that I am using, as well as where I compute the best trials checkpoint. The problem is that when I run the code I get:

Traceback (most recent call last):
  File "src\Control.py", line 199, in <module>
    action = agent.compute_single_action(state)
  File "\anaconda3\envs\Poker_Bot\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 1787, in compute_single_action
    policy = self.get_policy(policy_id)
  File "\anaconda3\envs\Poker_Bot\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 2037, in get_policy
    return self.workers.local_worker().get_policy(policy_id)
AttributeError: 'SingleAgentEnvRunner' object has no attribute 'get_policy'

It feels like it should be such an easy thing to do but I’ve been broken by this xD. Alongside this problem, I am also curious if I have set this configuration up correctly as I want my custom model to be the one to make the decisions regarding what action to take given an observation

# create the configuration
    config =
        PPOConfig()
        .api_stack(
            enable_rl_module_and_learner=True,
            enable_env_runner_and_connector_v2=True,
        )
        .environment(env_name)
        .framework("torch")
        .env_runners(
            num_env_runners=1,
            num_envs_per_env_runner=1,
            num_gpus_per_env_runner=1,
            # env_to_module_connector=_env_to_module,
        )
        .resources(
            num_cpus_for_main_process=1,
        )
        .rl_module(
            model_config_dict={
                "custom_model": "small_testing_agent",
            },
            rl_module_spec=SingleAgentRLModuleSpec(),
        )
    )
    
    # train the agent
    analysis = tune.run(
        "PPO",
        config=config.to_dict(),
        stop={"training_iteration": 2},  # stop conditions
        checkpoint_at_end=True, # save checkpoint at end
    )

    # get best trial
    best_trial = analysis.get_best_trial("episode_reward_mean", mode="max")

    # have it do one run-through of the environment and then render it
    agent = PPO(config=config.to_dict(), env=env_name)
    agent.restore(best_trial.checkpoint)

    # run the agent
    state = env.reset()
    done = False
    while not done:
        action = agent.compute_single_action(state)
        print("Taking action: ", action)
        state, reward, terminated, truncateds, info = env.step(action)
        env.render()
        print(f"Action: {action}, Reward: {reward}")

I encountered the same issue. You can see my post on it here. Unfortunately, I don’t have a solution yet. Hopefully this will get attention soon - it seems like such a basic capability.