Muesli Implementation

A policy-gradient algorithm named Muesli was just published by DeepMind, and it shows roughly 500% improvement over Impala across the atari benchmarks:

I was wondering if this is something that would be implemented at some point. I suspect that if implemented, this would become the most popular algorithm in rllib.


Thanks @smorad for suggesting this! Seems like a highly complex algo to implement as it consists of so many components. But I like the different criteria they discuss in the paper aiming for a super-stable, robust, versatile algo that learns fast.
I does look like we already have most of these components in RLlib, like MAML, model-based, PPO, etc… This just brings everything together, similar to what Rainbow did for the different “add-ons” of DQN.