Hello everyone,
I try to use PB2 from raytune to tune hyperparams by this comand in env google colab:
‘’’
from ray import air, tune
class CustomStopper(tune.Stopper):
def __init__(self):
self.should_stop = False
def __call__(self, trial_id, result):
max_iter = 21
if not self.should_stop and result["map"] > 0.6:
self.should_stop = True
return self.should_stop or result["training_iteration"] >= max_iter
def stop_all(self):
return self.should_stop
stopper = CustomStopper()
analysis = tune.run(
train_ray_tune,
resources_per_trial={"gpu": 1, "cpu":0},
scheduler=pb2,
stop=stopper,
# PBT starts by training many neural networks in parallel with random hyperparameters.
config=search_space,
verbose=2,
#num_samples=4,
checkpoint_score_attr="map",
keep_checkpoints_num=1,
local_dir="/content/drive/MyDrive/trash/Step2_tuning/carton/PB2/carton_raytune_pb2_24_08_2022/",
name="carton_raytune_pb2_version3")
‘’’
After some iterations, I received an warning “run out of disk” although I have precised my local dir is in my Google Drive. I make some verification and found that the “/root” dir 's size is about 115 gigabytes.
‘’’
(raylet) [2022-08-30 06:17:40,039 E 707 732] (raylet) file_system_monitor.cc:105: /tmp/ray/session_2022-08-29_18-18-07_337177_442 is over 95% full, available space: 6109593600; capacity: 179134558208. Object creation will fail if spilling is required.
115G /root/
‘’’
I try to kill the raytune process and “/root” is liberated. So I guess that “/root” is used by some temporaries work ?
My question is how that we can limit the resource used by raytune so that my disk is not run out again ?
Thank a lot for your help!