Tune and custom logger fails

dbk80 · March 27, 2024, 11:31am

Hi,

I am using a custom callback that works fine when I train an RL model without using tune:

config = ppo.PPOConfig(
        ).environment(MyCustomEnv, env_config=env_config, disable_env_checking=True
        ).training(
            model={
                'custom_model': model_class_name,
                'custom_model_config': custom_model_config
                },
        ).framework(eager_tracing=True, framework=framework
        ).resources(num_gpus=1, num_cpus_per_worker=1, num_cpus_per_learner_worker=2,
        ).rollouts(enable_connectors=False, num_rollout_workers=2).callbacks(MyCallbacks
        ).reporting(keep_per_episode_custom_metrics=True)
    
    algo = config.build()
    
    num_iterations = 2
    for _ in range(num_iterations):
        result = algo.train()

However, when I try to use tune. Namely:

    tuner = tune.Tuner(
        "PPO",
        run_config=air.RunConfig(
            storage_path=run_path,
            stop={
                "training_iteration": num_iterations,
            },
            # callbacks=[MyCallbacks],
        ),
        param_space=config,
    )
result = tuner.fit()

The custom callback is ignored. As stated above, the custom callback works fine when tune is not used.
If I try to include the custom callback also in my air.RunConfig by commenting out the line above, I receive the following error:

AttributeError: type object 'MyCallbacks' has no attribute 'setup'

I also tried the following variation: callbacks=[MyCallbacks()]. This leads to the same error.

My current ray version is 2.5.1 (pip install ray[default,rllib]==2.5.1)

jacob-thrackle · May 7, 2024, 5:40pm

I am having the same issue on Ray 2.20.0 (updated from 2.8.0 which also had the same problem). Any help is appreciated.

jacob-thrackle · May 7, 2024, 8:03pm

Replying to myself:

I was able to get it to run by subclassing tune.Callback but have not yet been able to see the custom metrics in the output. Simplified Callback class for posterity:

import numpy as np
from collections import deque
# from ray.rllib.algorithms.callbacks import DefaultCallbacks
from ray.tune import Callback
from ray.rllib.env import BaseEnv
from ray.rllib.policy import Policy
from ray.rllib.evaluation import RolloutWorker
from ray.rllib.evaluation.episode import Episode
from typing import Dict


class MarketCallbacks(Callback):
    def on_episode_start(
        self,
        *,
        worker: RolloutWorker,
        base_env: BaseEnv,
        policies: Dict[str, Policy],
        episode: Episode,
        env_index: int,
        **kwargs,
    ):
        # Initialize storage for prices in this episode
        episode.user_data["prices"] = []

    def on_episode_step(
        self,
        *,
        worker: RolloutWorker,
        base_env: BaseEnv,
        policies: Dict[str, Policy],
        episode: Episode,
        env_index: int,
        **kwargs,
    ):
        # Retrieve the market object from the environment
        market = base_env.get_unwrapped()[
            env_index
        ].market
        current_price = market.get_current_price()
        episode.user_data["prices"].append(current_price)

    def on_episode_end(
        self,
        *,
        worker: RolloutWorker,
        base_env: BaseEnv,
        policies: Dict[str, Policy],
        episode: Episode,
        env_index: int,
        **kwargs,
    ):
        prices = np.array(episode.user_data["prices"])
        episode.custom_metrics["max_price"] = np.max(prices)
        episode.custom_metrics["mean_price"] = np.mean(prices)
        episode.custom_metrics["std_price"] = np.std(prices)

        episode.hist_data["price"] = episode.user_data["price"]

    def on_train_result(self, *, algorithm, result: dict, **kwargs):
        # you can mutate the result dict to add new fields to return
        result["callback_ok"] = True

        p = result["custom_metrics"]["price"]
        result["custom_metrics"]["prices_max"] = np.max(p)
        result["custom_metrics"]["prices_mean"] = np.mean(p)
        result["custom_metrics"]["prices_std"] = np.std(p)

mannyv · May 8, 2024, 2:44pm

Hi @jacob-thrackle,

The terminology is overloaded which makes this confusing. The tune callbacks and rllib callbacks and they are different types. You are providing an rllib callback to the tune config. You need to pass it to the rllib AlgorithmConfig.

I could not find an example in the project but here is an issue which shows an example using both an rllib callback and a tune callback. You may need to adapt it to use the builder pattern instead if a dictionary.

github.com/ray-project/ray

[rllib] Custom metrics are duplicated

opened 01:12PM - 10 Aug 22 UTC

kayou12

bug P2 rllib observability

### What happened + What you expected to happen When logging custom metrics to …wandb via `WandbLoggerCallback` the metrics are duplicated and visible as: - `custom_metrics/*' - `sampler_resutls/custom_metrics/*` I would expect to see just `custom_metrics/*` I have implemented. ![image](https://user-images.githubusercontent.com/16837172/183909752-3077e346-0553-4732-a011-f1fbad2bdfb7.png) After some drilling I found most probably responsible line https://github.com/ray-project/ray/blob/master/rllib/algorithms/algorithm.py#L2541 with potentialy related `# TODO: Don't dump sampler results into top-level.` ### Versions / Dependencies 1.13.0 ### Reproduction script Modified example from https://github.com/ray-project/ray/blob/master/rllib/examples/custom_metrics_and_callbacks.py enriched with WandbLoggerCallback ``` """Example of using RLlib's debug callbacks. Here we use callbacks to track the average CartPole pole angle magnitude as a custom metric. """ from typing import Dict, Tuple import argparse import numpy as np import os import ray from ray import tune from ray.rllib.agents.callbacks import DefaultCallbacks from ray.rllib.env import BaseEnv from ray.rllib.evaluation import Episode, RolloutWorker from ray.rllib.policy import Policy from ray.rllib.policy.sample_batch import SampleBatch from ray.tune.integration.wandb import WandbLoggerCallback parser = argparse.ArgumentParser() parser.add_argument( "--framework", choices=["tf", "tf2", "tfe", "torch"], default="tf", help="The DL framework specifier.", ) parser.add_argument("--stop-iters", type=int, default=2000) class MyCallbacks(DefaultCallbacks): def on_episode_start( self, *, worker: RolloutWorker, base_env: BaseEnv, policies: Dict[str, Policy], episode: Episode, env_index: int, **kwargs ): # Make sure this episode has just been started (only initial obs # logged so far). assert episode.length == 0, ( "ERROR: `on_episode_start()` callback should be called right " "after env reset!" ) print("episode {} (env-idx={}) started.".format(episode.episode_id, env_index)) episode.user_data["pole_angles"] = [] episode.hist_data["pole_angles"] = [] def on_episode_step( self, *, worker: RolloutWorker, base_env: BaseEnv, policies: Dict[str, Policy], episode: Episode, env_index: int, **kwargs ): # Make sure this episode is ongoing. assert episode.length > 0, ( "ERROR: `on_episode_step()` callback should not be called right " "after env reset!" ) pole_angle = abs(episode.last_observation_for()[2]) raw_angle = abs(episode.last_raw_obs_for()[2]) assert pole_angle == raw_angle episode.user_data["pole_angles"].append(pole_angle) def on_episode_end( self, *, worker: RolloutWorker, base_env: BaseEnv, policies: Dict[str, Policy], episode: Episode, env_index: int, **kwargs ): # Check if there are multiple episodes in a batch, i.e. # "batch_mode": "truncate_episodes". if worker.policy_config["batch_mode"] == "truncate_episodes": # Make sure this episode is really done. assert episode.batch_builder.policy_collectors["default_policy"].batches[ -1 ]["dones"][-1], ( "ERROR: `on_episode_end()` should only be called " "after episode is done!" ) pole_angle = np.mean(episode.user_data["pole_angles"]) print( "episode {} (env-idx={}) ended with length {} and pole " "angles {}".format( episode.episode_id, env_index, episode.length, pole_angle ) ) episode.custom_metrics["pole_angle"] = pole_angle episode.hist_data["pole_angles"] = episode.user_data["pole_angles"] def on_sample_end(self, *, worker: RolloutWorker, samples: SampleBatch, **kwargs): print("returned sample batch of size {}".format(samples.count)) def on_train_result(self, *, trainer, result: dict, **kwargs): print( "trainer.train() result: {} -> {} episodes".format( trainer, result["episodes_this_iter"] ) ) # you can mutate the result dict to add new fields to return result["callback_ok"] = True def on_learn_on_batch( self, *, policy: Policy, train_batch: SampleBatch, result: dict, **kwargs ) -> None: result["sum_actions_in_train_batch"] = np.sum(train_batch["actions"]) print( "policy.learn_on_batch() result: {} -> sum actions: {}".format( policy, result["sum_actions_in_train_batch"] ) ) def on_postprocess_trajectory( self, *, worker: RolloutWorker, episode: Episode, agent_id: str, policy_id: str, policies: Dict[str, Policy], postprocessed_batch: SampleBatch, original_batches: Dict[str, Tuple[Policy, SampleBatch]], **kwargs ): print("postprocessed {} steps".format(postprocessed_batch.count)) if "num_batches" not in episode.custom_metrics: episode.custom_metrics["num_batches"] = 0 episode.custom_metrics["num_batches"] += 1 if __name__ == "__main__": args = parser.parse_args() ray.init() trials = tune.run( "PG", stop={ "training_iteration": args.stop_iters, }, config={ "env": "CartPole-v0", "num_envs_per_worker": 2, "callbacks": MyCallbacks, "framework": args.framework, # Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0. "num_gpus": int(os.environ.get("RLLIB_NUM_GPUS", "0")), }, callbacks=[WandbLoggerCallback( name="test", project="test", **{"api_key_file": "~/.wandb"}, )], ).trials ``` ### Issue Severity Low: It annoys or frustrates me.

https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.html#ray.rllib.algorithms.algorithm_config.AlgorithmConfig

Topic		Replies	Views
Example code uses RLlib's DefaultCallbacks, but tune.run expects tune.Callback Ray Tune	1	1378	July 14, 2022
Are custom LoggerCallbacks only intended for use with Ray Tune or also for use with plain RLlib? RLlib	2	621	January 19, 2022
Add step info dictionary to MLflowLoggerCallback with Tune RLlib	7	903	July 19, 2022
Possible to access default logger from environment? RLlib	15	1480	April 27, 2021
Obtain in eager mode Actor model output in ray.tune mode using DefaultCallbacks class RLlib	0	230	October 13, 2021

Tune and custom logger fails

Related topics