I use ray.tune.Tuner
where the trainable is a TorchTrainer
(the basic code structure as follows). I have 1 GPU, I want to run 2 tune trials at one GPU at the same time, so only 0.5GPU for one trial. I only find tune.with_resources
to set fraction GPU in the docs , but when the trainable is a TorchTrainer, which don’t works.
tuner = tune.Tuner(
trainer, # TorchTrainer
tune_config=tune.TuneConfig(
metric="best_fde",
mode="min",
scheduler=scheduler,
num_samples=num_samples,
reuse_actors=False
),
run_config=ray.air.RunConfig(
name = tuner_dir_name, # dir name
progress_reporter=tune.CLIReporter(max_report_frequency=600),
),
param_space={"train_loop_config": config,
# "scaling_config": ray.air.config.ScalingConfig(
# num_workers=2,
# resources_per_worker={
# "CPU": 4,
# "GPU":0.5
# }
# ),
}
)
results = tuner.fit()
I also test to set fraction GPU in the TorchTrainer, which also didn’t work as expected.
trainer = TorchTrainer(
train_loop_per_worker=train_func_per_worker,
train_loop_config={
"args": args,
},
scaling_config=ScalingConfig(
num_workers=2, # The number of workers (Ray actors) to launch
use_gpu=args.use_gpu,
resources_per_worker={"GPU":0.5},
),
run_config=ray.air.RunConfig(
progress_reporter=ray.tune.CLIReporter(max_report_frequency=600),
),
)
So I don’t know how to appropriatelly set the resource parameters. Please help me.