How to make all use of the GPU memory in Ray.tune

Yard1 · December 5, 2022, 4:23pm

If you want to tune your TorchTrainer, you should pass it directly, like this:

trainer = TorchTrainer(
        train_loop_per_worker=train_func_per_worker,
        train_loop_config={
            "args": args,
        },
        scaling_config=ScalingConfig(
            num_workers=args.ray_num_workers,  # The number of workers (Ray actors) to launch
            use_gpu=args.use_gpu,
        ),
        run_config=ray.air.RunConfig(
            progress_reporter=ray.tune.CLIReporter(max_report_frequency=600),
        ),
    )
tuner = tune.Tuner(
        trainer,
        param_space={"train_loop_config": config},
        tune_config=tune.TuneConfig(
            metric="ADE",
            mode="min",
            scheduler=scheduler,
            num_samples=num_samples,
            max_concurrent_trials=args.ray_num_workers
        ),
        run_config=ray.air.RunConfig(
            progress_reporter=tune.CLIReporter(max_report_frequency=600),
            checkpoint_config = ray.air.config.CheckpointConfig(num_to_keep=2, checkpoint_score_attribute="ADE", 
                            checkpoint_score_order="min")
        ),
    )

This will initialize the correct distributed backends and use resources as intended.

Let me know if this is what you are trying to do, or if you are trying to reuse the training function but for workflows that are different between Train and Tune.

Topic		Replies	Views
How to use fraction GPU in `ray.tune.Tuner`? Ray Train	6	1084	August 24, 2023
How to correclty allocate resources with Tune + TorchTrainer on Slurm	2	442	December 20, 2022
Using fractional GPU with TorchTrainer and Tuner API	3	903	August 22, 2023
Ray.tune with pytorch: only uses 1 of 4 GPUs	1	309	May 15, 2023
Ray using so much memory I cannot even start the tuning Ray Tune	5	2216	April 24, 2023

How to make all use of the GPU memory in Ray.tune

Related topics