How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I am using Ray 2.0 to train my PPO policy in a custom environment. I want to deploy (rollout) my trained policy in the environment.
My training is as follows:
train_steps = 10000
experiment_name = 'custom_env'
tuner = tune.Tuner("PPO", param_space=config.to_dict(), #to run with Tune
run_config=air.RunConfig(
name = experiment_name,
stop={"timesteps_total": train_steps}, #if I delete this, it runs forever
#verbose = 2,
checkpoint_config=air.CheckpointConfig(
checkpoint_frequency=50, checkpoint_at_end=True
),
)
)
results = tuner.fit()
# Get the best result based on a particular metric.
best_result = results.get_best_result(metric="episode_reward_mean", mode="max")
# Get the best checkpoint corresponding to the best result and save the checkpoint path
best_checkpoint = best_result.checkpoint
print(f"Trained model saved at {best_checkpoint}")
Questions
Now I want to use this trained policy on a different script file to do inference as below, but I don’t know:
Q1) how to load the trained policy?
Q2) do I have to register the environment again?
#Inference
episode_reward = 0
done = False
while not done:
action = algo.compute_single_action(obs) **#here I don't know what is algo?**
obs, reward, terminated, info = env.step(action) **#here I don't know how to call my custom env**
episode_reward += reward
print('epsidode_rewards', episode_reward)
Hi Lucia, I think this is you?
Short answers: Yes you have to register your environment again, if it is in another script.
Here you see how inference is done with RLlib agents: Serving RLlib Models — Ray 2.2.0
You do not need the server, but you need to restore the algorithm and then use its compute_single_action()
function.
Hope this helps
Simon
Yes! Thank you very much @Lars_Simon_Zehnder
How do I restore the algorithm?
Because in that example, I see that they do:
checkpoint_dir = algo.save("/tmp/rllib_checkpoint")
But I am training with Tune. fit()
And this one is giving me an error:
https://docs.ray.io/en/latest/rllib/rllib-training.html
from ray.rllib.algorithms.algorithm import Algorithm
checkpoint_path = string where my ray_results are
algo = Algorithm.from_checkpoint(checkpoint_path)
Error: Class algorithm has no “from_checkpoint” member"
@Username1,
do you know what a checkpoint is in RLlib? If you look into your checkpoint directory where Tune has stored all the checkpoints and the TensorBoard events, i.e. ray_results
(probably in ray_results/PPO/PPO_<Hyperpar1>_<Hyperpar2>_..._<Timestamp>
), you will see all the checkpoints that were stored.
In your case you could also look for the best_result.logdir
to see where the best checkpoint is stored. Then you use
# Put here the same configuration as used in training.
algo_config = (
PPOConfig()
.rollout(....)
....
)
# Here the environment needs to be registered before you use it.
algo = algo_config.build(env="custom_env")
algo.restore(checkpoint_path)
You then get the observations from your environment and get an action from the algorithm via
obs = .... # get the observation from your environment (e.g. env.reset())
action = algo.compute_single_action(obs)
I suggest to read carefully through the documentation of RLlib and to study the checkpoint directory that Tune has created to get an understanding how RLlib works.
Thank you @Lars_Simon_Zehnder
I am getting the checkpoint directory as recommended here
, but still the same error.
# Get the best result based on a particular metric.
best_result = results.get_best_result(metric="episode_reward_mean", mode="max")
# Get the best checkpoint corresponding to the best result and save the checkpoint path
best_checkpoint = best_result.checkpoint
print(f"Trained model saved at {best_checkpoint}")