Tmp Folder Filling up with trials

Diego_Rukoz · November 5, 2023, 6:57am

I have a storage server where i can store many TB of the model files for raytune to pause and use.
currently it is filling up my HPC server that has a small drive. in the \AppData\Local\Temp\2 folder in windows. raytune is creating a bunch of folders named “tmpxxxxxxxx” with a model.keras file inside and a metadata files.

I am using V2.8

I am already changing the temp dir and current dir / more wondering what other paths i need to change to fix this!

 ray.init(configure_logging=False,_temp_dir="Z:\\LSTM\\HYPER\\TMP")
    algo = TuneBOHB()
    algo = tune.search.ConcurrencyLimiter(algo, max_concurrent=1)
    scheduler = HyperBandForBOHB(
        time_attr="training_iteration",
        max_t=6000,
        stop_last_trials=False,
    )
    
    X_train, X_test,y_train,y_test = gendata_lstm.GenerateData.GenerateData(120)
    data = {"X_t": X_train, "X_tt": X_test, "y_t":y_train , "y_tt": y_test}
    trainable_with_resources = tune.with_resources(train_mnist, {"gpu": 2})
    
    tuner = tune.Tuner(
        tune.with_parameters(trainable_with_resources, data=data),
        tune_config=tune.TuneConfig(
            metric="val_loss",
            mode="min",
            search_alg=algo,
            scheduler=scheduler,
            num_samples=1000,
        ),
        run_config=train.RunConfig(
            name="LSMT_4L",
            storage_path="\\\\192.168.1.107\\StorageDrive\\LSTM\\HYPER",
            local_dir="\\\\192.168.1.107\\StorageDrive\\LSTM\\HYPER\\TMP",
            stop={"mean_accuracy": 0.99},
        ),
        param_space={
            "lr": tune.uniform(0.000001, 0.1),
            "l1": tune.randint(512, 2048),
            "l2": tune.randint(512, 2048),
            "l3": tune.randint(512, 2048),
            "l4": tune.randint(512, 2048),
            "decay": tune.uniform(1e-5, 1e-2),
        },
    )

Diego_Rukoz · November 5, 2023, 4:11pm

Found this in the code for the it was actually hard coded in TensorflowCheckpoint
Please guys let me know if you can change this i had to change the system env variable to fix this

Teenforever · November 16, 2023, 7:04am

@Diego_Rukoz I bumped into exactly the same thing under Windows environment, and I wrote a short script to clean up the folder from time to time. It is annoying, but for the time being it does not stop me from continuing with my training.

Diego_Rukoz · November 16, 2023, 7:27am

If you can share your script that would be helpful I opened up a bug and it was resolved and merged into the nightly build sadly I ended up trying the fix they added and it was not good enough for the issue I’m facing with storage on the HPC server I wish they would just let us set the directory the temp files I read the code and it’s just a env variable but I don’t want to change it system wide

justinvyu · November 16, 2023, 6:07pm

This is the fix PR that is mentioned: [train] Update TensorFlow ReportCheckpointCallback to delete temporary directory by matthewdeng · Pull Request #41033 · ray-project/ray · GitHub

Would being able to configure the TensorflowCheckpoint temp directory as an argument in its constructor be a suitable fix?

Another workaround would be to subclass the necessary methods of ReportCheckpointCallback and TensorflowCheckpoint to suit your needs!

Diego_Rukoz · November 16, 2023, 7:49pm

Yes being able to configure the tmp directory by variable would be awesome!

ninja0n3 · July 31, 2024, 4:25pm

On Windows, setting the runtime env variables TEMP, TMP and TMPDIR to the target temp directory seems to override the %localappdata%\Temp and force the target directory set in those variables as the checkpoint tmp location.

This is achieved by forcing “tempfile.mkdtemp()” in the TensorflowCheckpoint definition to use those as the temporary windows directory.

Sam_Chan · August 6, 2024, 6:55am

This is a great tip; tahnks @ninja0n3

Topic		Replies	Views
Hyperparameter Tuning with specified session directory (!=/tmp/ray/)	4	926	January 9, 2023
Alternatives for tmp directory Ray Tune	2	1099	May 4, 2021
Set root temporary path with ray tune Ray Tune	1	2767	January 5, 2023
(raylet)file_system_monitor.cc:105: - "Object creation will fail if spilling is required" Ray Tune	5	6249	September 9, 2022
Logs and results are cleared when the program ends Ray Tune	3	254	November 9, 2023

Tmp Folder Filling up with trials

Related topics