If you want to tune your TorchTrainer, you should pass it directly, like this:
trainer = TorchTrainer(
train_loop_per_worker=train_func_per_worker,
train_loop_config={
"args": args,
},
scaling_config=ScalingConfig(
num_workers=args.ray_num_workers, # The number of workers (Ray actors) to launch
use_gpu=args.use_gpu,
),
run_config=ray.air.RunConfig(
progress_reporter=ray.tune.CLIReporter(max_report_frequency=600),
),
)
tuner = tune.Tuner(
trainer,
param_space={"train_loop_config": config},
tune_config=tune.TuneConfig(
metric="ADE",
mode="min",
scheduler=scheduler,
num_samples=num_samples,
max_concurrent_trials=args.ray_num_workers
),
run_config=ray.air.RunConfig(
progress_reporter=tune.CLIReporter(max_report_frequency=600),
checkpoint_config = ray.air.config.CheckpointConfig(num_to_keep=2, checkpoint_score_attribute="ADE",
checkpoint_score_order="min")
),
)
This will initialize the correct distributed backends and use resources as intended.
Let me know if this is what you are trying to do, or if you are trying to reuse the training function but for workflows that are different between Train and Tune.