Stacking callback objects [Solved. Code included.]

Is there an elegant way to stack DefaultCallbacks objects?

I have one that mutates sample batch in on_postprocess_trajectory() and another that logs custom metrics in on_episode_end(). Right now I merge the code into one class, which is not satisfying (very not OOP).

If I pass them in as a list, I get an type error about not a DefaultCallbacks type.

class MyCallbacks(DefaultCallbacks):
  def on_postprocess_trajectory(self,
  # etc

class AnotherCallbacks(DefaultCallbacks):
  def on_episode_end(self,
  # etc

config = {
  "env" : AgentEnv,
  "lr" : 1e-4, 
  "callbacks": [MyCallbacks, AnotherCallbacks],

Edit: typo

Personally I do not see an issue with merging them together but another alternative would be to have one inherit from the other.

class MyCallbacks(DefaultCallbacks):
  def on_postprocess_trajectory(self,
  # etc

class AnotherCallbacks(MyCallbacks):
  def on_episode_end(self,
  #if needed
  # etc
1 Like

Inheritance would work perfectly. However, I was looking for a solution via configuration. Main reason is that if a new callbacks class is written, new inheritance code (also for combinations of different callbacks!) also need to be written. Here is a merge function that can take any number of callback classes:

Edit: move code to a Github gist.

Hey @mannyv and @RickLan , I’m currently also thinking about making the callbacks API better by replacing it with an events-based system, where anyone (user, algos, exploration components, models, etc…) can subscribe to built-in events (such as “episode_step”, “learn_on_batch”, etc…), but also define custom events and then be able to trigger those as well, in case someone else wants to subscribe to these. This would have the following advantages:

  • We could define single callables to handle an event.
  • These could still be methods of an object (such as something similar to DefaultCallbacks) to allow keeping some state between different events.
  • We could get rid of some of the build-policy/trainer options, e.g. postprocess_trajectory_fn, action_sampler_fn, etc… and replace them by events to which an algo/policy then subscribes.
  • We could probably get rid of the entire exploration API (which is - in a way - already purely event based).
  • While we are at it, make sure storing custom metrics (of any form) is intuitive and easy. This currently seems to be the biggest problem in this API, that users don’t know what type of data they can store and where they should store it.

Just some brainstorming on callbacks, which have been found to be unintuitive by many users. Let me know your thoughts on this.

Hi @RickLan,

Yes there are multiple ways to address this. I like the forced inheritance method because it encourages the developer to reason about the chain of callbacks and conflicts that might exist between them. Imagine if two callbacks are mutating state in incompatible ways. I think adding classes to a list encourages this less or at least it would for me. We design against our own failings right? This said, I think both are perfectly workable solutions and like many things comes down to preference.

@sven1977, I think that would be a nice feature. My main worry would be detecting and warning users when event handlers are stomping on each other. I have used callbacks quite a bit to change the observations, add new keys, etc. The order of handlers will matter in some cases. I can think of cases where I might have to use the techniques above, i.e. manually chaining handlers in a list or through inheritance to enforce a guaranteed ordering. If that happens then you will have written this nice event driven interface that is engineered around to get to a place that already exists.

Hey @mannyv , totally agree. The (hidden) order would be the biggest weakness of such an even based system. E.g. for reward-tampering callbacks (e.g. curiosity adding an internal reward, then the user doesn’t know, whether this already happened or not). However, the ordering conflicts that you describe are more on the user end (“user callback 1 vs user callback 2”), not really “algo (or algo-plugin) vs user”.
You would still be able to handle this with inheritance of different Callbacks classes.

Either way, maybe we should start with simply making the metrics a little more intuitive and allowing any type of data to be stored (I think this is limited right now to scalars and video types ??). Then leave the API redesign for some time later.


A third way: both also works together :slight_smile:

1 Like

I’m not a software design guru, but I always like better tools :slight_smile:
Edit: I probably don’t know enough about RLlib.

From my point of view, as an user/researcher, if the documentation explains, e.g. how the event-based system works, with plenty of examples, then I would be hitting the ground running. I think you are building a very powerful tool. Sometimes it is not intuitive to me to combine two features (ie my question about LR schedule in Examples can serve that purpose. Of course, I also appreciate having a responsive discussion like here.

Yes! Image support would be nice. I hacked it here: [RLlib] Writing to tensorboard during custom evaluation - #3 by RickLan

1 Like

I would love some better tensorboard logging. E.g. array of images would be nice (instead of being converted to video) for visualising autoencoder reconstruction, etc.

For the events, maybe it makes sense to give them very descriptive name, trace once at startup (similar to trajectory_view ), and print them out. E.g.

Event Order:

Yeah, this is a great idea! We should do the same for the trajectory view API: Actually print out some results from the pre-passes. At least for logging level INFO.


This is a great idea. Of course you know someone, probably one of us actually, will come along and try to dynamically add and remove event listeners.

I’m not sure we’ll have to add/remove dynamically. An alternative would be inheriting from a parent class and adding a switch, e.g.:

class CustomCuriosityExplorationReward(CuriosityExplorationReward):

   def custom_loss(self, *args, **kwargs)
     if self.enabled:
        return super().custom_loss(args, kwargs)