Error message:
WARNING:syncer.py:505 -- Last Sync command failed: Sync process failed: [Errno 2] Failed to open local file '<hyp_config>_train_parquet_file_1.c000' Detail:[errno 2] No such file or directory -- WARNING tune.py919 -- Trial Runner checkpoting failed :Sync process failed [Errno 2] Failed to open local file '<hyp_config>_valid_parquet_file_22.c000' Detail: [errno 2] No such file or directory
I’m using raytune with tensorflow to train a deep learning model. All the training data is in S3. My workflow is as follows, download parquet file from S3 then create tf.data.Dataset from generator that is created by yielding data within parquet file. Since downloading all parquet files then creating a generator is not possible due to disk constraints, I download one parquet file each, yield data, delete them and this process goes on until all training parquet file has been accessed. So each time I’m downloading ‘train_parquet_file_{i}.c000’ then deleting it.
All logs and hpo results seems to be uploaded to S3 just fine. Also model is training well without parquet file downloaded in local machine training wouldn’t have been successful. So I’m guessing this WARNING is due to unsynced file when I delete the used parquet files. What exactly is this WARNING caused by and how to avoid it?