How to make all use of the GPU memory in Ray.tune

If you want to tune your TorchTrainer, you should pass it directly, like this:

trainer = TorchTrainer(
        train_loop_per_worker=train_func_per_worker,
        train_loop_config={
            "args": args,
        },
        scaling_config=ScalingConfig(
            num_workers=args.ray_num_workers,  # The number of workers (Ray actors) to launch
            use_gpu=args.use_gpu,
        ),
        run_config=ray.air.RunConfig(
            progress_reporter=ray.tune.CLIReporter(max_report_frequency=600),
        ),
    )
tuner = tune.Tuner(
        trainer,
        param_space={"train_loop_config": config},
        tune_config=tune.TuneConfig(
            metric="ADE",
            mode="min",
            scheduler=scheduler,
            num_samples=num_samples,
            max_concurrent_trials=args.ray_num_workers
        ),
        run_config=ray.air.RunConfig(
            progress_reporter=tune.CLIReporter(max_report_frequency=600),
            checkpoint_config = ray.air.config.CheckpointConfig(num_to_keep=2, checkpoint_score_attribute="ADE", 
                            checkpoint_score_order="min")
        ),
    )

This will initialize the correct distributed backends and use resources as intended.

Let me know if this is what you are trying to do, or if you are trying to reuse the training function but for workflows that are different between Train and Tune.