Add custom policy to config on a non multi-agent setup

Jon_Flynn · June 4, 2023, 11:41am

I’m wondering how to alter the default policy in the PPOConfig. I’m aware it can be set up like so for a multiagent environment:

def policy_map_fn(agent_id: str, _episode=None, _worker=None, **_kwargs) -> str:
    """
    Maps agent_id to policy_id
    """
    return 'policy'

algo = (
    ppo.PPOConfig()
    .environment(MyEnv)
    .multi_agent(
        policies={
            "policy": (
                CustomPolicy,
                observation_space,
                ActionSpace(),
                ppo.PPOConfig.overrides(gamma=0.9),
            ),
        },
        policy_mapping_fn=policy_map_fn,
    )
    .framework("torch")
    .build()
)

I can’t see anywhere in the API or documentation though how to use your own custom policy in a non multiagent setup?

mannyv · June 4, 2023, 2:36pm

github.com

ray-project/ray/blob/master/rllib/examples/custom_tf_policy.py

import argparse
import os

import ray
from ray import air, tune
from ray.rllib.algorithms.algorithm import Algorithm
from ray.rllib.evaluation.postprocessing import discount_cumsum
from ray.rllib.policy.tf_policy_template import build_tf_policy
from ray.rllib.utils.framework import try_import_tf

tf1, tf, tfv = try_import_tf()

parser = argparse.ArgumentParser()
parser.add_argument("--stop-iters", type=int, default=200)
parser.add_argument("--num-cpus", type=int, default=0)


def policy_gradient_loss(policy, model, dist_class, train_batch):
    logits, _ = model(train_batch)
    action_dist = dist_class(logits, model)

This file has been truncated. show original

Jon_Flynn · June 4, 2023, 6:23pm

Thanks, I used this code to build my algorithm class:

class AlteredPPOAlgo(ppo.PPO):
    @classmethod
    def get_default_policy_class(cls, config):
        return CustomPolicy

config = (
    ppo.PPOConfig(AlteredPPOAlgo)
    .environment(MyEnv)
    .framework("torch")
)

config.model.update({
    "custom_model": "my_torch_model",
    "custom_action_dist": "my_dist",
})

algo = config.build()

Topic		Replies	Views
Passing custom policy multi-agent RLlib	3	849	December 28, 2021
Assign different custom models according to agent_id in multiagent setting RLlib	3	524	March 12, 2021
Setting up multiagent config dict with different algorithm parameters RLlib	2	277	December 16, 2022
Can't understand training config Configure Algorithm, Training, Evaluation, Scaling	2	35	July 30, 2024
Best ways to customize a PPO algorithm variant in Ray2.8.0 RLlib	1	153	April 29, 2024

Add custom policy to config on a non multi-agent setup

Related topics