Ray is not using GPU after restoring experiment with tune.Tune.restore()

Drito · December 21, 2023, 12:51pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

After restoring an experiment, Ray trains the restored trial using CPU and ignores GPU.`def BayesOptimization(self):

    storage_path = os.path.join(RayTuner_CONFIG.RESULTS_FOLDER, f'RayTune_{args.data_mode}_{str(args.data_number_of_samples)}_{self.classes}')
    experiment_dir = os.path.join(storage_path, RayTuner_CONFIG.RAY_EXPERIMENT_NAME)

    algo = BayesOptSearch(
        metric = self.metric, 
        mode = self.algo_mode
    )
    
    if tune.Tuner.can_restore(experiment_dir):
        
        tuner = tune.Tuner.restore(
            experiment_dir,
            trainable=self.model,
            resume_unfinished=True, 
            resume_errored=True, 
            restart_errored=False,
            param_space=self.hyperParams
        )
        print(f"\nRestoring experiment from: {experiment_dir}\n")
        
    else:
    
        tuner = tune.Tuner(
            tune.with_resources(
                tune.with_parameters(self.model),
                resources={"cpu": self.cpuFrac, "gpu": self.gpuFrac}
            ),
            tune_config=tune.TuneConfig(
                search_alg=algo,
                num_samples=self.analysis_numberOfsamples,
            ),
            param_space=self.hyperParams,
            run_config=RunConfig(
                                name=RayTuner_CONFIG.RAY_EXPERIMENT_NAME,
                                progress_reporter = self.reporter,
                                storage_path=storage_path,
                                local_dir=storage_path,
                                stop=TrialPlateauStopper(
                                    metric=self.metric,
                                    mode=self.algo_mode,
                                    std=0.0001,
                                    num_results=10,
                                    grace_period=50
                                ),
                                checkpoint_config=train.CheckpointConfig(
                                    num_to_keep=1,
                                    checkpoint_score_attribute=self.metric,
                                    checkpoint_score_order=self.algo_mode
                                )
            )
        )  
        
    results = tuner.fit()

    return results

Is there any way to tell Ray to use GPU after restoration?

Thanks

justinvyu · December 26, 2023, 6:01pm

@Drito Can you specify the wrapped trainable in Tuner.restore?

trainable = tune.with_resources(
    tune.with_parameters(self.model),
    resources={"cpu": self.cpuFrac, "gpu": self.gpuFrac}
)

# use this trainable in both the regular + restore code path

Drito · January 8, 2024, 11:16am

Hello justinvyu.

Your suggestion worked perfectly.

Thanks for your help,

Leandro

Topic		Replies	Views
Unable to restore Ray Tune previous experiment checkpoint Ray Tune	8	989	June 1, 2023
Not able to resume experiment Ray Tune	5	961	December 12, 2022
How do I run my experiment on a single GPU?	4	1669	August 20, 2023
GPU not used Ray 2.7.1	1	350	November 5, 2023
Correct way of using tuner.restore() Ray Tune	6	2234	November 16, 2022

Ray is not using GPU after restoring experiment with tune.Tune.restore()

Related topics