Policy rollout on Ray Tune 2.0

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I am using Ray 2.0 to train my PPO policy in a custom environment. I want to deploy (rollout) my trained policy in the environment.

My training is as follows:

  train_steps = 10000
  experiment_name = 'custom_env'
  tuner = tune.Tuner("PPO", param_space=config.to_dict(), #to run with Tune
                              run_config=air.RunConfig(
                              name =  experiment_name,
                              stop={"timesteps_total": train_steps}, #if I delete this, it runs forever
                              #verbose = 2,
                              checkpoint_config=air.CheckpointConfig(
                              checkpoint_frequency=50, checkpoint_at_end=True
                                 ),
                                )
                                  )
    results = tuner.fit()

# Get the best result based on a particular metric.
    best_result = results.get_best_result(metric="episode_reward_mean", mode="max")
 # Get the best checkpoint corresponding to the best result and save the checkpoint path
    best_checkpoint = best_result.checkpoint
    print(f"Trained model saved at {best_checkpoint}")

Questions

Now I want to use this trained policy on a different script file to do inference as below, but I don’t know:

Q1) how to load the trained policy?
Q2) do I have to register the environment again?

#Inference
episode_reward = 0
done           = False
while not done:
    action = algo.compute_single_action(obs) **#here I don't know what is algo?**
    obs, reward, terminated, info = env.step(action) **#here I don't know how to call my custom env**
    episode_reward += reward

print('epsidode_rewards', episode_reward)

Hi Lucia, I think this is you?

Short answers: Yes you have to register your environment again, if it is in another script.

Here you see how inference is done with RLlib agents: Serving RLlib Models — Ray 2.2.0

You do not need the server, but you need to restore the algorithm and then use its compute_single_action() function.

Hope this helps
Simon

Yes! Thank you very much @Lars_Simon_Zehnder

How do I restore the algorithm?

Because in that example, I see that they do:

checkpoint_dir = algo.save("/tmp/rllib_checkpoint")

But I am training with Tune. fit()

And this one is giving me an error:
https://docs.ray.io/en/latest/rllib/rllib-training.html

from ray.rllib.algorithms.algorithm import Algorithm
checkpoint_path = string where my ray_results are
algo = Algorithm.from_checkpoint(checkpoint_path)

Error: Class algorithm has no “from_checkpoint” member"

@Username1,

do you know what a checkpoint is in RLlib? If you look into your checkpoint directory where Tune has stored all the checkpoints and the TensorBoard events, i.e. ray_results (probably in ray_results/PPO/PPO_<Hyperpar1>_<Hyperpar2>_..._<Timestamp>), you will see all the checkpoints that were stored.

In your case you could also look for the best_result.logdir to see where the best checkpoint is stored. Then you use

# Put here the same configuration as used in training. 
algo_config = (
      PPOConfig() 
      .rollout(....)
      ....
)
# Here the environment needs to be registered before you use it.
algo = algo_config.build(env="custom_env") 
algo.restore(checkpoint_path)

You then get the observations from your environment and get an action from the algorithm via

obs = .... # get the observation from your environment (e.g. env.reset())
action = algo.compute_single_action(obs)

I suggest to read carefully through the documentation of RLlib and to study the checkpoint directory that Tune has created to get an understanding how RLlib works.

Thank you @Lars_Simon_Zehnder

I am getting the checkpoint directory as recommended here

, but still the same error.

# Get the best result based on a particular metric.
    best_result = results.get_best_result(metric="episode_reward_mean", mode="max")

# Get the best checkpoint corresponding to the best result and save the checkpoint path
    best_checkpoint = best_result.checkpoint
    print(f"Trained model saved at {best_checkpoint}")