Evaluating a Trained Model in Hierarchical Reinforcement Learning

Hunk · February 14, 2025, 3:36pm

Background:

We are working with ray-2.12.0 and RLlib for hierarchical reinforcement learning (HRL). We are trying to load and evaluate the already trained low-level policy and high-level policy. Below is the current code snippet we’re using:

# The following code has issues:
from ray.rllib.policy.policy import Policy
model_save_path = "./checkpoint_000000" 
policy = Policy.from_checkpoint(model_save_path)
env = HierarchicalWindyMazeEnv(env_config)
for episode in range(3):
    obs = env.reset()
    action = policy['high_level_policy'].compute_single_action(obs)
    next_obs, reward, terminated, truncated, info = env.step(action)
    obs = next_obs

    # Run for a maximum of 200 time steps
    for time_step in range(200):
        # Concatenate current observation with the goal
        full_obs = np.concatenate([env.cur_obs, env.current_goal])
        
        # Low-level policy decision
        low_level_action = policy['low_level_policy'].compute_single_action(full_obs)
        next_obs, reward, terminated, truncated, info = env.step(low_level_action)
        
        # Update observation
        obs = next_obs

reference link:

demo_after_training
[hierarchical_training]
(ray/rllib/examples/hierarchical/hierarchical_training.py at ray-2.12.0 · ray-project/ray · GitHub)

Problems Encountered:

1. How to load a trained model using PPOConfig?
We are currently using Policy.from_checkpoint() to load the model, but we would like to know how to load a pre-trained PPO model using PPOConfig.
2. How to correctly evaluate low_level_policy and high_level_policy?
We are trying to access policy[‘low_level_policy’] and policy[‘high_level_policy’] directly for evaluation, but something seems wrong. We are unsure how to correctly evaluate and invoke these two policies in a hierarchical environment.

Expected Solution:

Clarification on how to load and evaluate a trained PPO model using PPOConfig.
Guidance on how to correctly evaluate and invoke low_level_policy and high_level_policy in a hierarchical RL setup.

Thank you so much for your help, I really appreciate it!

Topic		Replies	Views
How to use my pretrained model as policy and value netwok RLlib	6	1198	December 26, 2023
How to deploy a trained Ray RLlib PPO policy/model in multi-agent-case? RLlib	5	825	November 10, 2021
How do I evaluate my trained policy after tune.fit() RLlib	1	711	March 30, 2023
Step by step way to interact with an environment and update an agent Configure Algorithm, Training, Evaluation, Scaling	1	351	May 23, 2023
RLlib: using evaluation workers on previously trained models RLlib	7	2235	December 8, 2022

Evaluating a Trained Model in Hierarchical Reinforcement Learning

Related topics