Lightning + hydra + ray tune basic example

alemelis · September 19, 2021, 6:49pm

Hello, I have a pytorch lightning model whose hyper parameters are handled by hydra config.

These configs are organised in different folders as hydra makes these easy to manage. This is the template for my main config.

defaults:
  - _self_ 
  - trainer: default_trainer
  - training: default_training
  - model: default_model
  - data: default_data
  - augmentation: default_augmentation
  - transformation: default_transformation

general:
  log_dir: outputs
  inference_checkpoint: path/to/checkpoint

For each section in the defaults list there are other yaml files containing the various hyper parameters. For example, the training one is in conf/training/default_training.yaml and it reads:

a_w: 1.0
b_w: 1.0
c_w: 1.0
d_w: 1.0
# etc...

My (ultra-simplified) training script looks like:

def train_model(cfg):
    model = MyLightningModel(cfg)
    trainer = Trainer(**cfg.trainer)
    trainer.fit(model)

@hydra.main(config_path='conf', config_name='config')
def main(cfg: DictConfig):
    train_model(cfg)

if __name__ == '__main__':
    main()

Now, what I’d love to do is to use Ray Tune to find optimal values for a_w, b_w, c_w, and d_w. By reading this guide, I managed to change the training script as:

def train_model(cfg):
    model = MyLightningModel(cfg)
    tune_callback = TuneReportCallback({"loss": "val/avg_loss"},  on="validation_end")
    trainer = Trainer(**cfg.trainer, callbacks=[tune_callback])
    trainer.fit(model)

@hydra.main(config_path='conf', config_name='config')
def main(cfg: DictConfig):
    scheduler = ASHAScheduler(max_t=10, grace_period=1, reduction_factor=2)
    reporter = CLIReporter(metric_columns=["dIOUa", "training_iteration"],
                           parameter_columns=["rmse_w"])

    cfg.training.a_w = tune.uniform(0.0, 1.0)
    cfg.training.b_w = tune.uniform(0.0, 1.0)
    cfg.training.c_w = tune.uniform(0.0, 1.0)
    cfg.training.d_w = tune.uniform(0.0, 1.0)

    analysis = tune.run(tune.with_parameters(train_hydranet,
                                             num_epochs=10,
                                             num_gpus=0),
                        resources_per_trial={
                            "cpu": 1,
                            "gpu": 0
                        },
                        metric="loss",
                        mode="min",
                        config=cfg,
                        num_samples=10,
                        scheduler=scheduler,
                        progress_reporter=reporter,
                        name="tune_mymodel")

    print("Best hyperparameters found were: ", analysis.best_config)

if __name__ == '__main__':
    main()

This however raises an exception

Traceback (most recent call last):
    cfg.training.a_w = tune.uniform(0.0, 1.0)
omegaconf.errors.UnsupportedValueType: Value 'Float' is not a supported primitive type
    full_key: training.a_w
    object_type=dict

This is not super clear to me, is it telling me that I cannot assign a Float value to a dict?

What would the proper way of using RayTune with hydra? Changing the way configs are handled is not really an option as we heavily rely on hydra at this point (I may need to rewrite most of the lightning module).

Thanks a lot!

xwjiang2010 · October 26, 2021, 8:16pm

Hi,
Could you try following this? I think the workflow is pretty similar to yours.

Topic		Replies	Views
Ray and Hydra integration Ray Tune	8	2244	October 11, 2022
[SGD] Hydra + RaySGD (PyTorch Lightning) Ray Tune	2	611	June 15, 2021
How to integrate TorchTrainer with Ray Tune and HyperOpt space?	2	494	February 16, 2023
Hyperparameter Tuning with specified session directory (!=/tmp/ray/)	4	906	January 9, 2023
Tune & Pytorch Lightning: trials do not terminate, others do Ray Tune	1	416	December 5, 2022

Lightning + hydra + ray tune basic example

Related topics