Background:
We are working with ray-2.12.0
and RLlib
for hierarchical reinforcement learning (HRL). We are trying to load and evaluate the already trained low-level policy and high-level policy. Below is the current code snippet we’re using:
# The following code has issues:
from ray.rllib.policy.policy import Policy
model_save_path = "./checkpoint_000000"
policy = Policy.from_checkpoint(model_save_path)
env = HierarchicalWindyMazeEnv(env_config)
for episode in range(3):
obs = env.reset()
action = policy['high_level_policy'].compute_single_action(obs)
next_obs, reward, terminated, truncated, info = env.step(action)
obs = next_obs
# Run for a maximum of 200 time steps
for time_step in range(200):
# Concatenate current observation with the goal
full_obs = np.concatenate([env.cur_obs, env.current_goal])
# Low-level policy decision
low_level_action = policy['low_level_policy'].compute_single_action(full_obs)
next_obs, reward, terminated, truncated, info = env.step(low_level_action)
# Update observation
obs = next_obs
reference link:
- demo_after_training
- [hierarchical_training]
(ray/rllib/examples/hierarchical/hierarchical_training.py at ray-2.12.0 · ray-project/ray · GitHub)
Problems Encountered:
1. How to load a trained model using PPOConfig?
We are currently using Policy.from_checkpoint() to load the model, but we would like to know how to load a pre-trained PPO model using PPOConfig.
2. How to correctly evaluate low_level_policy and high_level_policy?
We are trying to access policy[‘low_level_policy’] and policy[‘high_level_policy’] directly for evaluation, but something seems wrong. We are unsure how to correctly evaluate and invoke these two policies in a hierarchical environment.
Expected Solution:
- Clarification on how to load and evaluate a trained PPO model using
PPOConfig
. - Guidance on how to correctly evaluate and invoke
low_level_policy
andhigh_level_policy
in a hierarchical RL setup.
Thank you so much for your help, I really appreciate it!