Can anyone actually see the post?
ok so since the post doesnt look like its showing here it is again:
Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hi So I run into a problem where I am unable to work out how to compute actions after training. What I want to do is train a model using PPO and then run through my environment again, but this time rendering each step so that I can see what is happening and be able to give it data from outside the environment and have it make decisions based on those observations. Below is the configuration and tune that I am using, as well as where I compute the best trials checkpoint. The problem is that when I run the code I get:
Traceback (most recent call last):
File "src\Control.py", line 199, in <module>
action = agent.compute_single_action(state)
File "\anaconda3\envs\Poker_Bot\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 1787, in compute_single_action
policy = self.get_policy(policy_id)
File "\anaconda3\envs\Poker_Bot\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 2037, in get_policy
return self.workers.local_worker().get_policy(policy_id)
AttributeError: 'SingleAgentEnvRunner' object has no attribute 'get_policy'
It feels like it should be such an easy thing to do but I’ve been broken by this xD. Alongside this problem, I am also curious if I have set this configuration up correctly as I want my custom model to be the one to make the decisions regarding what action to take given an observation
# create the configuration
config =
PPOConfig()
.api_stack(
enable_rl_module_and_learner=True,
enable_env_runner_and_connector_v2=True,
)
.environment(env_name)
.framework("torch")
.env_runners(
num_env_runners=1,
num_envs_per_env_runner=1,
num_gpus_per_env_runner=1,
# env_to_module_connector=_env_to_module,
)
.resources(
num_cpus_for_main_process=1,
)
.rl_module(
model_config_dict={
"custom_model": "small_testing_agent",
},
rl_module_spec=SingleAgentRLModuleSpec(),
)
)
# train the agent
analysis = tune.run(
"PPO",
config=config.to_dict(),
stop={"training_iteration": 2}, # stop conditions
checkpoint_at_end=True, # save checkpoint at end
)
# get best trial
best_trial = analysis.get_best_trial("episode_reward_mean", mode="max")
# have it do one run-through of the environment and then render it
agent = PPO(config=config.to_dict(), env=env_name)
agent.restore(best_trial.checkpoint)
# run the agent
state = env.reset()
done = False
while not done:
action = agent.compute_single_action(state)
print("Taking action: ", action)
state, reward, terminated, truncateds, info = env.step(action)
env.render()
print(f"Action: {action}, Reward: {reward}")
I encountered the same issue. You can see my post on it here. Unfortunately, I don’t have a solution yet. Hopefully this will get attention soon - it seems like such a basic capability.
Yeah, I’ve run into that error message early on with RLlib as well. The official way of loading something from a checkpoint for inference can be found in the /examples section of RLlib on Github (and I’ve got example code here, if you’d like to use it - the ObsVectorizationWrapper is a workaround for a crash associated with Repeated observation spaces).
Basically, what’s going wrong in your code is that you’re querying the environment runner (that collects data for training the agent) instead of the agent itself.
@MCW_Lad, @DreamerJ1, @mchomlin : you are all using the new API stack right? which ray version are you using?
Yep, new API stack. I’m using the latest version of Ray. The example code I wrote should work out of the box on the environment in the same repo, and the files in rllib/examples/inference are also pretty helpful for what you’re trying to do.