Advanced evaluation with wandb, RLlib and Tune (weight, gradient, activation histogram)

EpicPinkPenguin · January 30, 2022, 5:32pm

Hey all,

I was just wondering if there is any way to get weight, gradient, activation histograms working with the Weights and Biases callback for Tune and the barebone PPO algorithm (Torch)?

One of the main parts of training deep RL models, is keeping the DNNs healthy used for function approximation, so I was just suprised that there is no easy way right now to get this working.

I though about two approaches.

A custom callback:

class MyCallback(DefaultCallbacks):
    def on_train_result(self, trainer: Trainer, result: dict, **kwargs):
        trainer.get_policy().get_weights().items()
        for layer_name, value in trainer.get_policy().get_weights().items():
            result[layer_name] = wandb.Histogram(value)

Accessing the Torch model directly and using the normal wandb callback:

# This is no actual code, but rather pseudo code.
pytorch_model = trainer.get_model()
wandb.watch(pytorch_model, log_freq=100)

But both are not working due to some limitations of Ray. In the first one, wandb.Histogram is not an allowed data type, so it is ignored and not reported to the wandb dashboard.

The second one is also not working, because you can’t use wandb methods, as they conflict with the WandbLoggerCallback of tune (to my knowledge).

I would highly appreciate any feedback if this is possible and if there are plans to make this work seamlessly.

PS: Other custom metrics are working like a charm, thanks Ray team.

iamhatesz · March 21, 2022, 9:25am

An official guide how to use wandb.watch with RLlib models would be super cool!

Topic		Replies	Views
Tune + RLLIB + Wandb integration Configure Algorithm, Training, Evaluation, Scaling	0	113	June 17, 2024
How to use the wandb logger callback when using algorithm api RLlib	5	607	December 29, 2022
Using Wandb with Rllib / Tune RLlib	0	512	June 11, 2021
How to use WandB mixin in custom call backs RLlib	3	317	August 31, 2021
Tune/WandB - Group different samples under job type	1	300	March 7, 2023

Advanced evaluation with wandb, RLlib and Tune (weight, gradient, activation histogram)

Related topics