Transfer Learning for Multi-Agent env. with RLlib

wzaielamri · February 15, 2022, 8:08am

I am having a problem with RlLib:
I trained a network and it achieved good results.
When restoring the last checkpoint, everything works fine. However, if initializing a new trainer (similar than the trained one and setting its weights equal to the trained one) I do not get good results.



preTrained_trainer = PPOTrainer(config=config_trained, env=config_trained["env"])
# Restore all policies from checkpoint.
preTrained_trainer.restore(config_checkpoint)
# Get trained weights for all policies.
trained_weights = preTrained_trainer.get_weights()

new_trainer = PPOTrainer(config=config_trained, env=config_trained["env"],)
# Set back all weights# trained weights.

new_trainer.set_weights({
    pid: w for pid, w in trained_weights.items()
})

PS: I thought of copying the filters by doing this:

# copy the filters policy_frozen are all the policies trained
for policy_name in policy_frozen:
    new_trainer.workers.local_worker().filters[policy_name] = preTrained_trainer.workers.local_worker().filters[policy_name]

However, I still have bad results.

Did I miss something? Should I set something else in addition to the weights in order to get the same trainer?

mannyv · February 15, 2022, 1:55pm

Hi @wzaielamri,

Do you have a simple but complete reproduction script you could provide?

wzaielamri · February 16, 2022, 9:26am

Here is the whole code. It is a rollout episode of my mujoco environment (Ant-Agent).
The question is not directly connected to the code provided: When coping the weights of the 4 policies in “new_trainer”, the new trainer does not achieve the same results as the “preTrained_trainer”.

So the question is: what could be wrong? Is it enough to copy the weights? Are the filters also important to copy? And is there something else that should be copied in order to get similar performances?

PS: I know it is possible to restore directly the PPO trainer with the restore function. However, I want to initialize my own trainer from another one for later uses: In other words, I want to have my own customized restore function.

import ray
import pickle5 as pickle
import os
import gym
import numpy as np
from ray.tune.registry import get_trainable_cls
from ray.rllib.evaluation.worker_set import WorkerSet
from maze_envs.quantruped_centralizedController_environment import Quantruped_Centralized_Env
from ray.rllib.agents.ppo import PPOTrainer

from evaluation.rollout_episodes import rollout_episodes

"""
    Visualizing a learned (multiagent) controller,
    for evaluation or visualisation.
    
    This is adapted from rllib's rollout.py
    (github.com/ray/rllib/rollout.py)
"""

# Setting number of steps and episodes
num_steps = int(600)
num_episodes = int(1)

ray.init()

smoothness = 1

# Selecting checkpoint to load
config_checkpoints = [
    './ray_results/2_2_0_QuantrupedMultiEnv/PPO_QuantrupedMultiEnv_2c71b_00004_4_2022-02-14_18-11-40/checkpoint_002500/checkpoint-2500',
]

for config_checkpoint in config_checkpoints:
    config_dir = os.path.dirname(config_checkpoint)
    config_path = os.path.join(config_dir, "params.pkl")

    # Loading configuration for checkpoint.
    if not os.path.exists(config_path):
        config_path = os.path.join(config_dir, "../params.pkl")

    if os.path.exists(config_path):
        with open(config_path, "rb") as f:
            config_trained = pickle.load(f)


    # Starting ray and setting up ray.
    if "num_workers" in config_trained:
        config_trained["num_workers"] = min(1, config_trained["num_workers"])
    cls = get_trainable_cls('PPO')
    # Setting config values (required for compatibility between versions)
    config_trained["create_env_on_driver"] = True
    config_trained['env_config']['hf_smoothness'] = smoothness
    if "no_eager_on_workers" in config_trained:
        del config_trained["no_eager_on_workers"]


    config_trained['num_envs_per_worker'] = 1  # 4

    preTrained_trainer = PPOTrainer(config=config_trained, env=config_trained["env"])
    # Restore all policies from checkpoint.
    preTrained_trainer.restore(config_checkpoint)
    # Get trained weights for all policies.
    trained_weights = preTrained_trainer.get_weights()

    new_trainer = PPOTrainer(config=config_trained, env=config_trained["env"],)
    # Set back all weights# trained weights.

    new_trainer.set_weights({
        pid: w for pid, w in trained_weights.items()
    })

    policy_frozen=["Agent_0_policy","Agent_1_policy","Agent_2_policy","Agent_3_policy"]
    # copy the filters policy_frozen are all the policies trained
    for policy_name in policy_frozen:
        new_trainer.workers.local_worker().filters[policy_name] = preTrained_trainer.workers.local_worker().filters[policy_name]


    # Retrieve environment for the trained agent.
    if hasattr(new_trainer, "workers") and isinstance(new_trainer.workers, WorkerSet):
        env = new_trainer.workers.local_worker().env
        
    save_image_dir = "./videos/" + \
        config_checkpoint.split("/")[-4]


    # Rolling out simulation = stepping through simulation.
    reward_eps, steps_eps, dist_eps, power_total_eps, vel_eps, cot_eps = rollout_episodes(env, new_trainer, num_episodes=num_episodes,
                                                                                          num_steps=num_steps, render=True, camera_name="side_fixed", plot=False, save_images=save_image_dir+"/img_")

    new_trainer.stop()

wzaielamri · February 17, 2022, 11:39am

@mannyv

So what are the portions that i am missing to get the same run results as using the restore function?
As I understood from the code of the restore function in GitHub:

Weights are being restored: important!
Filter values are being restored: important!
States (i.e. information) (like checkpoint number episode numbers etc): not important to copy? Correct?

So by coping the weights and filters everything should be fine

PS: Just to clarify why I need this: I want later on to restore only specific policies and change other policies with new one, that have other observation space, etc (transfer learning on specific policies). And now i am just restoring manually everything to test if the restoring function works, before proceeding to the next step.

ricky · September 21, 2022, 7:12pm

@wzaielamri Did you ever figure this issue out? I am facing the same problem.

Topic		Replies	Views
How do I copy the model? RLlib	2	446	June 28, 2021
RLLib Multiagent: Load only one policy from checkpoint & Compatibility of RLLib/Tune Checkpoints RLlib	9	3297	November 24, 2021
How to restore part of weights from trained model into another model before training? RLlib	0	282	April 12, 2021
Restoring nn after training in multi agent environment Checkpointing, Restoring	3	304	September 25, 2023
Restoring RLlib Run Using Tuner.restore RLlib	5	625	February 17, 2024

Transfer Learning for Multi-Agent env. with RLlib

Related topics