"/root" is overused by raytune and kill notebook by "run out of disk"

I try to use PB2 from raytune to tune hyperparams by this comand in env google colab:

from ray import air, tune

  class CustomStopper(tune.Stopper):
      def __init__(self):
          self.should_stop = False
      def __call__(self, trial_id, result):
          max_iter = 21
          if not self.should_stop and result["map"] > 0.6:
              self.should_stop = True
          return self.should_stop or result["training_iteration"] >= max_iter
      def stop_all(self):
          return self.should_stop
  stopper = CustomStopper()
  analysis = tune.run(
              resources_per_trial={"gpu": 1, "cpu":0},
              # PBT starts by training many neural networks in parallel with random hyperparameters. 


After some iterations, I received an warning “run out of disk” although I have precised my local dir is in my Google Drive. I make some verification and found that the “/root” dir 's size is about 115 gigabytes.

(raylet) [2022-08-30 06:17:40,039 E 707 732] (raylet) file_system_monitor.cc:105: /tmp/ray/session_2022-08-29_18-18-07_337177_442 is over 95% full, available space: 6109593600; capacity: 179134558208. Object creation will fail if spilling is required.
115G /root/

I try to kill the raytune process and “/root” is liberated. So I guess that “/root” is used by some temporaries work ?
My question is how that we can limit the resource used by raytune so that my disk is not run out again ?

@jjyao Some session logs? Do you know how big they are expected to be?

@xwjiang2010 @jjyao Hello, do you find the reason of this problem ?
Could you please give me some solutions that I could test in this case ?
@Khoi_LE sorry for the late reply.

I’m trying to understand the issue. Are you saying this warning message /tmp/ray/session_2022-08-29_18-18-07_337177_442 is over 95% full, available space: 6109593600; capacity: 179134558208. Object creation will fail if spilling is required. kills the notebook? This uses /tmp/ray instead of /root.

I think it’s not simple in that way. The notebook has not been killed, but it was stopped because run out of disk. I have check the size of “/tmp/ray” it was a bout 3.4 GBs as I remembered. But the “/root” was about 115 GBs and it reduced (slowly to 0) when I had killed the process tuning. Therefore, in my observations, by default, there is a lot memory of disk have been used in “/root” during the process. I hope that there is a way to control when disk is almost full, it would manage the disk (by control the file not been used, etc) to solve this problem.


Could you check which files are using the /root space? I don’t think Ray writes things to /root.


I think yes because when I kill process raytune, the consummations of disk is reduced. I have check the heaviest file in the root is this

And this is his content

After some research, I think that is because of the choice “local_dir” is in Google Drive. The Google Drive File Stream require a lot of cache to update the file from local to My Drive. So maybe if we choose a local_dir at local virtual machine and do a backup copy to Google Drive after some steps, this may help us to not interact too much with Drive and avoid this kind of problem.

