How to use my pretrained model as policy and value netwok

kourosh · June 15, 2023, 8:22pm

We have recently rolled out a new module and learning stack inside RLlib (RLModules) that gives you a lot of flexibility in achieving what you just explained. Here is the example code.

github.com

ray-project/ray/blob/master/rllib/examples/learner/train_w_bc_finetune_w_ppo.py

"""
This example shows how to pretrain an RLModule using behavioral cloning from offline
data and, thereafter training it online with PPO.
"""

import gymnasium as gym
import shutil
import tempfile
import torch
from typing import Mapping

import ray
from ray import tune
from ray.air import RunConfig, FailureConfig
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.algorithms.ppo.torch.ppo_torch_rl_module import PPOTorchRLModule
from ray.rllib.algorithms.ppo.ppo_catalog import PPOCatalog
from ray.rllib.core.models.base import ACTOR, ENCODER_OUT
from ray.rllib.core.rl_module.rl_module import SingleAgentRLModuleSpec

This file has been truncated. show original

We recommend to use these nightly releases to use RLModule features.

Having said that it is still possible to do this type of stuff in the current stable release, it would just need more work. It involves two steps:

You need to define a custom model that would initialize the actor and value function using the architecture that they were pertained on. At this stage you need to make sure that you can train an algorithm with this custom model via RLlib. Example:
ray/rllib/examples/custom_rnn_model.py at master · ray-project/ray · GitHub
You need to use callbacks’s on_algorithm_init hook to load the weights onto your custom model during the initialization.
Related discourse q: Updating policy_mapping_fn while using tune.run() and restoring from a checkpoint - #3 by Muff2n
Related example: ray/rllib/examples/restore_1_of_n_agents_from_checkpoint.py at master · ray-project/ray · GitHub

cc @avnishn

Topic		Replies	Views
Unable to load trained RL Model with Ray Train RLlib	1	49	February 18, 2025
How to pretrain a model with behavior cloning RLlib	14	5239	December 5, 2023
How to deploy a trained Ray RLlib PPO policy/model in multi-agent-case? RLlib	5	828	November 10, 2021
Pre-train one type of policies in MARL Checkpointing, Restoring	0	58	June 18, 2024
A little help for a novice RLlib	1	433	October 26, 2022

How to use my pretrained model as policy and value netwok

Related topics