Stacking callback objects [Solved. Code included.]

RickLan · April 27, 2021, 2:39am

Is there an elegant way to stack DefaultCallbacks objects?

I have one that mutates sample batch in on_postprocess_trajectory() and another that logs custom metrics in on_episode_end(). Right now I merge the code into one class, which is not satisfying (very not OOP).

If I pass them in as a list, I get an type error about not a DefaultCallbacks type.

class MyCallbacks(DefaultCallbacks):
  def on_postprocess_trajectory(self,
  # etc

class AnotherCallbacks(DefaultCallbacks):
  def on_episode_end(self,
  # etc

config = {
  "env" : AgentEnv,
  "lr" : 1e-4, 
  "callbacks": [MyCallbacks, AnotherCallbacks],
}

Edit: typo

mannyv · April 27, 2021, 2:48am

Personally I do not see an issue with merging them together but another alternative would be to have one inherit from the other.

class MyCallbacks(DefaultCallbacks):
  def on_postprocess_trajectory(self,
  # etc

class AnotherCallbacks(MyCallbacks):
  def on_episode_end(self,
  #if needed
  #super().episode_end(...)
  # etc

RickLan · April 27, 2021, 4:05am

Inheritance would work perfectly. However, I was looking for a solution via configuration. Main reason is that if a new callbacks class is written, new inheritance code (also for combinations of different callbacks!) also need to be written. Here is a merge function that can take any number of callback classes:

gist.github.com

https://gist.github.com/rlan/1b050a95536913f632742d72a84ae382

merge_callbacks.py

from typing import Dict, List, Optional, TYPE_CHECKING
from ray.rllib.agents.callbacks import DefaultCallbacks

from ray.rllib.env import BaseEnv
from ray.rllib.policy import Policy
from ray.rllib.policy.sample_batch import SampleBatch
from ray.rllib.evaluation import MultiAgentEpisode
from ray.rllib.utils.annotations import PublicAPI
from ray.rllib.utils.deprecation import deprecation_warning
from ray.rllib.utils.typing import AgentID, PolicyID

This file has been truncated. show original

Edit: move code to a Github gist.

sven1977 · April 27, 2021, 12:22pm

Hey @mannyv and @RickLan , I’m currently also thinking about making the callbacks API better by replacing it with an events-based system, where anyone (user, algos, exploration components, models, etc…) can subscribe to built-in events (such as “episode_step”, “learn_on_batch”, etc…), but also define custom events and then be able to trigger those as well, in case someone else wants to subscribe to these. This would have the following advantages:

We could define single callables to handle an event.
These could still be methods of an object (such as something similar to DefaultCallbacks) to allow keeping some state between different events.
We could get rid of some of the build-policy/trainer options, e.g. postprocess_trajectory_fn, action_sampler_fn, etc… and replace them by events to which an algo/policy then subscribes.
We could probably get rid of the entire exploration API (which is - in a way - already purely event based).
While we are at it, make sure storing custom metrics (of any form) is intuitive and easy. This currently seems to be the biggest problem in this API, that users don’t know what type of data they can store and where they should store it.

Just some brainstorming on callbacks, which have been found to be unintuitive by many users. Let me know your thoughts on this.

mannyv · April 27, 2021, 2:03pm

Hi @RickLan,

Yes there are multiple ways to address this. I like the forced inheritance method because it encourages the developer to reason about the chain of callbacks and conflicts that might exist between them. Imagine if two callbacks are mutating state in incompatible ways. I think adding classes to a list encourages this less or at least it would for me. We design against our own failings right? This said, I think both are perfectly workable solutions and like many things comes down to preference.

@sven1977, I think that would be a nice feature. My main worry would be detecting and warning users when event handlers are stomping on each other. I have used callbacks quite a bit to change the observations, add new keys, etc. The order of handlers will matter in some cases. I can think of cases where I might have to use the techniques above, i.e. manually chaining handlers in a list or through inheritance to enforce a guaranteed ordering. If that happens then you will have written this nice event driven interface that is engineered around to get to a place that already exists.

sven1977 · April 27, 2021, 3:03pm

Hey @mannyv , totally agree. The (hidden) order would be the biggest weakness of such an even based system. E.g. for reward-tampering callbacks (e.g. curiosity adding an internal reward, then the user doesn’t know, whether this already happened or not). However, the ordering conflicts that you describe are more on the user end (“user callback 1 vs user callback 2”), not really “algo (or algo-plugin) vs user”.
You would still be able to handle this with inheritance of different Callbacks classes.

Either way, maybe we should start with simply making the metrics a little more intuitive and allowing any type of data to be stored (I think this is limited right now to scalars and video types ??). Then leave the API redesign for some time later.

RickLan · April 29, 2021, 10:50am

A third way: both also works together

RickLan · April 29, 2021, 11:05am

I’m not a software design guru, but I always like better tools
Edit: I probably don’t know enough about RLlib.

From my point of view, as an user/researcher, if the documentation explains, e.g. how the event-based system works, with plenty of examples, then I would be hitting the ground running. I think you are building a very powerful tool. Sometimes it is not intuitive to me to combine two features (ie my question about LR schedule in tune.run()). Examples can serve that purpose. Of course, I also appreciate having a responsive discussion like here.

RickLan · April 29, 2021, 11:19am

Yes! Image support would be nice. I hacked it here: [RLlib] Writing to tensorboard during custom evaluation - #3 by RickLan

smorad · April 29, 2021, 11:54am

I would love some better tensorboard logging. E.g. array of images would be nice (instead of being converted to video) for visualising autoencoder reconstruction, etc.

For the events, maybe it makes sense to give them very descriptive name, trace once at startup (similar to trajectory_view ), and print them out. E.g.

Event Order:
OnEpisodeStep
  AddAdditionalMetrics
  PostprocessMyEpisodes
OnTrainStep
  CustomLoss
    CuriosityExplorationReward
    AuxiliaryLossReward

sven1977 · April 29, 2021, 2:07pm

Yeah, this is a great idea! We should do the same for the trajectory view API: Actually print out some results from the pre-passes. At least for logging level INFO.

mannyv · April 29, 2021, 10:48pm

@smorad,

This is a great idea. Of course you know someone, probably one of us actually, will come along and try to dynamically add and remove event listeners.

smorad · April 30, 2021, 11:29am

I’m not sure we’ll have to add/remove dynamically. An alternative would be inheriting from a parent class and adding a switch, e.g.:

class CustomCuriosityExplorationReward(CuriosityExplorationReward):

   def custom_loss(self, *args, **kwargs)
     if self.enabled:
        return super().custom_loss(args, kwargs)

Topic		Replies	Views
Persisting values across callbacks RLlib	2	600	May 8, 2023
Example code uses RLlib's DefaultCallbacks, but tune.run expects tune.Callback Ray Tune	1	1281	July 14, 2022
Printing input_dict["obs_flat"] with custom DefaultCallbacks RLlib	5	449	December 16, 2021
Logging from DefaultCallbacks to LoggerCallback gives weird behavior RLlib	1	340	June 9, 2022
Logging stuff in a custom gym environment using RLlib and Tune RLlib	4	1368	June 1, 2022

Stacking callback objects [Solved. Code included.]

Related topics