Policy rollout on Ray Tune 2.0

Username1 · December 15, 2022, 4:49pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

I am using Ray 2.0 to train my PPO policy in a custom environment. I want to deploy (rollout) my trained policy in the environment.

My training is as follows:

  train_steps = 10000
  experiment_name = 'custom_env'
  tuner = tune.Tuner("PPO", param_space=config.to_dict(), #to run with Tune
                              run_config=air.RunConfig(
                              name =  experiment_name,
                              stop={"timesteps_total": train_steps}, #if I delete this, it runs forever
                              #verbose = 2,
                              checkpoint_config=air.CheckpointConfig(
                              checkpoint_frequency=50, checkpoint_at_end=True
                                 ),
                                )
                                  )
    results = tuner.fit()

# Get the best result based on a particular metric.
    best_result = results.get_best_result(metric="episode_reward_mean", mode="max")
 # Get the best checkpoint corresponding to the best result and save the checkpoint path
    best_checkpoint = best_result.checkpoint
    print(f"Trained model saved at {best_checkpoint}")

Questions

Now I want to use this trained policy on a different script file to do inference as below, but I don’t know:

Q1) how to load the trained policy?
Q2) do I have to register the environment again?

#Inference
episode_reward = 0
done           = False
while not done:
    action = algo.compute_single_action(obs) **#here I don't know what is algo?**
    obs, reward, terminated, info = env.step(action) **#here I don't know how to call my custom env**
    episode_reward += reward

print('epsidode_rewards', episode_reward)

Lars_Simon_Zehnder · December 15, 2022, 6:45pm

Hi Lucia, I think this is you?

Short answers: Yes you have to register your environment again, if it is in another script.

Here you see how inference is done with RLlib agents: Serving RLlib Models — Ray 2.2.0

You do not need the server, but you need to restore the algorithm and then use its compute_single_action() function.

Hope this helps
Simon

Username1 · December 15, 2022, 6:49pm

Yes! Thank you very much @Lars_Simon_Zehnder

How do I restore the algorithm?

Because in that example, I see that they do:

checkpoint_dir = algo.save("/tmp/rllib_checkpoint")

But I am training with Tune. fit()

And this one is giving me an error:
https://docs.ray.io/en/latest/rllib/rllib-training.html

from ray.rllib.algorithms.algorithm import Algorithm
checkpoint_path = string where my ray_results are
algo = Algorithm.from_checkpoint(checkpoint_path)

Error: Class algorithm has no “from_checkpoint” member"

Lars_Simon_Zehnder · December 15, 2022, 7:33pm

@Username1,

do you know what a checkpoint is in RLlib? If you look into your checkpoint directory where Tune has stored all the checkpoints and the TensorBoard events, i.e. ray_results (probably in ray_results/PPO/PPO_<Hyperpar1>_<Hyperpar2>_..._<Timestamp>), you will see all the checkpoints that were stored.

In your case you could also look for the best_result.logdir to see where the best checkpoint is stored. Then you use

# Put here the same configuration as used in training. 
algo_config = (
      PPOConfig() 
      .rollout(....)
      ....
)
# Here the environment needs to be registered before you use it.
algo = algo_config.build(env="custom_env") 
algo.restore(checkpoint_path)

You then get the observations from your environment and get an action from the algorithm via

obs = .... # get the observation from your environment (e.g. env.reset())
action = algo.compute_single_action(obs)

I suggest to read carefully through the documentation of RLlib and to study the checkpoint directory that Tune has created to get an understanding how RLlib works.

Username1 · December 15, 2022, 7:38pm

Thank you @Lars_Simon_Zehnder

I am getting the checkpoint directory as recommended here

, but still the same error.

# Get the best result based on a particular metric.
    best_result = results.get_best_result(metric="episode_reward_mean", mode="max")

# Get the best checkpoint corresponding to the best result and save the checkpoint path
    best_checkpoint = best_result.checkpoint
    print(f"Trained model saved at {best_checkpoint}")

Topic		Replies	Views
Restoring RLlib Run Using Tuner.restore RLlib	5	625	February 17, 2024
RLLib: How to use policy learned in tune.run()? RLlib	6	997	September 21, 2023
Updating policy_mapping_fn while using tune.run() and restoring from a checkpoint RLlib	7	902	July 4, 2023
Compute/display actions from ray.tune RLlib	10	1682	March 30, 2021
Unable to restore fully trained checkpoint RLlib	19	2947	October 21, 2023

Policy rollout on Ray Tune 2.0

Questions

Related topics