Custom action sampler not taking effect

bmanczak · November 29, 2021, 11:13am

Hi there!

I am trying to write a custom action sampler and use it to train the PPO agent. However, the way of doing that is described in the documentation Extending Existing Policies seems to have no effect on the policy and just the default policy is used for training.

The dummy example is shown below:

import ray
import ray.rllib.agents.ppo as ppo
from ray.tune.logger import pretty_print
from ray.rllib.agents.ppo.ppo_torch_policy import PPOTorchPolicy
from ray.rllib.agents.ppo import PPOTrainer 


def build_action_sampler(policy, model, input_dict, state, explore, timestep): 
        return 0, None, None # the agent should crash or not learn with this sampler

CustomPPOTorchPolicy = PPOTorchPolicy.with_updates(
    name="CustomPPOTorchPolicy",action_sampler_fn=build_action_sampler)
 
CustomTrainer = PPOTrainer.with_updates(
default_policy=CustomPPOTorchPolicy)

config = ppo.DEFAULT_CONFIG.copy()
config["num_gpus"] = 0
config["num_workers"] = 1
config["framework"] = "torch"
trainer = CustomTrainer(config=config, env="CartPole-v0")

for i in range(1000):
   result = trainer.train()
   print(pretty_print(result))

What’s happening here?

Topic		Replies	Views
Not able to locate rllib train function code RLlib	6	311	March 22, 2023
Controlling compute_actions during training RLlib	0	376	November 26, 2021
Training a custom agent against a random policy RLlib	0	215	November 14, 2023
Explorative action or not? RLlib	1	268	April 26, 2022
Custom model with LSTM crashes PPO sampler.py RLlib	0	266	November 24, 2023

Custom action sampler not taking effect

Related topics