Custom action sampler not taking effect

Hi there!

I am trying to write a custom action sampler and use it to train the PPO agent. However, the way of doing that is described in the documentation Extending Existing Policies seems to have no effect on the policy and just the default policy is used for training.

The dummy example is shown below:

import ray
import ray.rllib.agents.ppo as ppo
from ray.tune.logger import pretty_print
from ray.rllib.agents.ppo.ppo_torch_policy import PPOTorchPolicy
from ray.rllib.agents.ppo import PPOTrainer 


def build_action_sampler(policy, model, input_dict, state, explore, timestep): 
        return 0, None, None # the agent should crash or not learn with this sampler

CustomPPOTorchPolicy = PPOTorchPolicy.with_updates(
    name="CustomPPOTorchPolicy",action_sampler_fn=build_action_sampler)
 
CustomTrainer = PPOTrainer.with_updates(
default_policy=CustomPPOTorchPolicy)

config = ppo.DEFAULT_CONFIG.copy()
config["num_gpus"] = 0
config["num_workers"] = 1
config["framework"] = "torch"
trainer = CustomTrainer(config=config, env="CartPole-v0")

for i in range(1000):
   result = trainer.train()
   print(pretty_print(result))

What’s happening here?

1 Like