Muesli Implementation

smorad · May 3, 2021, 6:56pm

A policy-gradient algorithm named Muesli was just published by DeepMind, and it shows roughly 500% improvement over Impala across the atari benchmarks: https://arxiv.org/pdf/2104.06159.pdf

I was wondering if this is something that would be implemented at some point. I suspect that if implemented, this would become the most popular algorithm in rllib.

sven1977 · May 4, 2021, 1:46pm

Thanks @smorad for suggesting this! Seems like a highly complex algo to implement as it consists of so many components. But I like the different criteria they discuss in the paper aiming for a super-stable, robust, versatile algo that learns fast.
I does look like we already have most of these components in RLlib, like MAML, model-based, PPO, etc… This just brings everything together, similar to what Rainbow did for the different “add-ons” of DQN.

Topic		Replies	Views
Inverse reinforcement learning algorithms RLlib	4	513	January 23, 2025
Performance of algorithms RLlib	3	596	September 2, 2021
Will RLlib consider implementing more distributed RL algorithms? RLlib	2	333	July 6, 2022
DeepMind's DreamerV3 RLlib	1	446	February 12, 2023
[Impala] I'm getting a zero vf_loss, policy_loss after only 16k steps RLlib	1	248	May 6, 2021

Muesli Implementation

Related topics