Hello,
I am running some experiments with Tune on 16 CPU cores and 1 GPU machine and configured --cpus-per-trial 5 and --gpus-per-trial 0.5. I am obliged to set half of the GPU available for each trial since I notice a severe CPU RAM overhead because of the tune.run( ) call leaves many pending trials (17).
In the system monitor, I indeed see a rapid growth of memory and some processes that occupy RAM but are not actually being run since only two at the time run on GPU.
So basically, I can only run two trials in parallel since my machine will run out of CPU RAM rather than GPU RAM. I would like to decrease the number of pending trials to exploit the GPU at its full potential.
I checked the documentation and found that the environment variable TUNE_MAX_PENDING_TRIALS_PG could be changed to adapt the behavior.
So far, I haven’t succeeded, though, and I would greatly appreciate any help.
E.g., the naive modification at the top of the script, like in the following, doesn’t change anything:
os.environ["TUNE_MAX_PENDING_TRIALS_PG"] = "1"
...
result = tune.run(
ray_train_classifier,
name=args.exp_name,
resources_per_trial={"cpu": args.cpus_per_trial, "gpu": args.gpus_per_trial},
config=config,
num_samples=args.num_trials,
progress_reporter=reporter,
local_dir=args.log_dir,
checkpoint_at_end=args.checkpoint_at_end,
resume=args.resume
)