On_postprocess_traj can not be called

Theo_Fan · July 20, 2025, 5:04am

Hi,

I am trying to apply reward shaping after collecting rollout data but before PPO starts training the network. According to the official RLlib documentation (see screenshot), the recommended way is to use the on_postprocess_trajectory callback.

However, when I implemented the callback class (see minimal example below), the method on_postprocess_trajectory is never called during training.

The following is my test code:


class InspectBatchCallback(RLlibCallback):
    def __init__(self):
        super().__init__()
        self.episode_cnt = 0

    def on_environment_created(self, *, env_runner,
                               metrics_logger: Optional[MetricsLogger] = None,
                               env: gym.Env, **kwargs, ):
        print("ENVIRONMENT CREATED")

    def on_algorithm_init(self, *, algorithm,
                          metrics_logger: Optional[MetricsLogger] = None, **kwargs, ):
        print("ALGORITHM INIT")

    def on_episode_created(self, *, episode: EpisodeV2, **kwargs):
        print("EPISODE CREATED")

    def on_episode_start(self, *, worker, base_env,
                         policies: Optional[Dict[PolicyID, Policy]] = None, **kwargs):
        print("EPISODE START")

    def on_episode_step(self, *, episode, env, **kwargs):
        print("EPISODE STEP CALLED")

    def on_episode_end(self, *, episode, metrics_logger, **kwargs):
        self.episode_cnt += 1
        print("EPISODE END")
        print(f"Episode {self.episode_cnt} finished.")
        print()

    def on_learn_on_batch(self, *, policy, train_batch, result, **kwargs):
        print("ON LEARN ON BATCH CALLED")

    def on_postprocess_trajectory(self, worker, episode, agent_id, policy_id,
                                  policies, postprocessed_batch, original_batches, **kwargs):
        print("POSTPROCESS TRAJECTORY CALLED")

    def on_train_result(self, *, algorithm, result, **kwargs):
        print("TRAIN RESULT CALLED")


ray.shutdown()
ray.init(ignore_reinit_error=True)

config = (
    PPOConfig()
    .environment("CartPole-v1")
    .framework("torch")

    .env_runners(
        num_env_runners=2,
        batch_mode="complete_episodes"
    )
    .callbacks(
        InspectBatchCallback
    )
    .learners(
        num_learners=1,
    )
)

tune.Tuner(
    "PPO",
    param_space=config,
    run_config=tune.RunConfig(
        name="ppo_cartpole_exp",
        verbose=1,
        stop={
            "training_iteration": 3,
            "time_total_s": 10,
        },
    )
).fit()

christina · July 21, 2025, 10:23pm

Hi Theo,
We recently started upgrading RLlib to a new stack, and you can find the migration guide here: ray/doc/source/rllib/new-api-stack-migration-guide.rst at releases/2.47.1 · ray-project/ray · GitHub

Specifically this part caught my eye:

on_postprocess_trajectory(): The new API stack no longer triggers and calls this method because :py:class:~ray.rllib.connectors.connector_v2.ConnectorV2 pipelines handle trajectory processing entirely. The documentation for :py:class:~ray.rllib.connectors.connector_v2.ConnectorV2 is under development.

I’ve reached out to the RLlib team for clarification, and if the doc you linked is out of date we will update it accordingly. Let me know if you have any other questions about doing your implementation!

Topic		Replies	Views
Post process trajectory with full episode RLlib	1	425	October 17, 2023
How to shape the reward successfully RLlib	0	18	July 29, 2025
RLlib Batch Postprocessing has steps from other trajectories RLlib	5	420	April 22, 2024
How to do the reward normalization in RLlib's PPO RLlib	2	3261	December 14, 2021
How to access the policy object in the on_train_result callback of Ray RLlib? The callback signature is on_train_result(self, , algorithm, result, *kwargs) RLlib	0	16	March 20, 2025

On_postprocess_traj can not be called

Related topics