Evaluating a Trained Model in Hierarchical Reinforcement Learning

Background:

We are working with ray-2.12.0 and RLlib for hierarchical reinforcement learning (HRL). We are trying to load and evaluate the already trained low-level policy and high-level policy. Below is the current code snippet we’re using:

# The following code has issues:
from ray.rllib.policy.policy import Policy
model_save_path = "./checkpoint_000000" 
policy = Policy.from_checkpoint(model_save_path)
env = HierarchicalWindyMazeEnv(env_config)
for episode in range(3):
    obs = env.reset()
    action = policy['high_level_policy'].compute_single_action(obs)
    next_obs, reward, terminated, truncated, info = env.step(action)
    obs = next_obs

    # Run for a maximum of 200 time steps
    for time_step in range(200):
        # Concatenate current observation with the goal
        full_obs = np.concatenate([env.cur_obs, env.current_goal])
        
        # Low-level policy decision
        low_level_action = policy['low_level_policy'].compute_single_action(full_obs)
        next_obs, reward, terminated, truncated, info = env.step(low_level_action)
        
        # Update observation
        obs = next_obs

reference link:

  1. demo_after_training
  2. [hierarchical_training]
    (ray/rllib/examples/hierarchical/hierarchical_training.py at ray-2.12.0 · ray-project/ray · GitHub)

Problems Encountered:

1. How to load a trained model using PPOConfig?
We are currently using Policy.from_checkpoint() to load the model, but we would like to know how to load a pre-trained PPO model using PPOConfig.
2. How to correctly evaluate low_level_policy and high_level_policy?
We are trying to access policy[‘low_level_policy’] and policy[‘high_level_policy’] directly for evaluation, but something seems wrong. We are unsure how to correctly evaluate and invoke these two policies in a hierarchical environment.

Expected Solution:

  1. Clarification on how to load and evaluate a trained PPO model using PPOConfig.
  2. Guidance on how to correctly evaluate and invoke low_level_policy and high_level_policy in a hierarchical RL setup.

Thank you so much for your help, I really appreciate it!