I am running my .py script inside a pod on OpenShift Container Platform and it is running very good but at the end when I am trying to get path to best_checkpoint I am facing following error
File “test_hyperopt.py”, line 127, in
logger.info(“Best Checkpoint directory: \n{}\n”.format(analysis.get_best_checkpoint(best_trial, metric=“score”, mode=“max”)))
File “/opt/conda/lib/python3.8/site-packages/ray/tune/analysis/experiment_analysis.py”, line 469, in get_best_checkpoint
return TrialCheckpoint(local_path=best_path, cloud_path=cloud_path)
File “/opt/conda/lib/python3.8/site-packages/ray/tune/cloud.py”, line 86, in init
Checkpoint.init(self, uri=PLACEHOLDER)
File “/opt/conda/lib/python3.8/site-packages/ray/ml/checkpoint.py”, line 131, in init
local_path = _get_local_path(uri)
File “/opt/conda/lib/python3.8/site-packages/ray/ml/checkpoint.py”, line 457, in _get_local_path
if path is None or is_non_local_path_uri(path):
File “/opt/conda/lib/python3.8/site-packages/ray/ml/utils/remote_storage.py”, line 74, in is_non_local_path_uri
if bool(get_fs_and_path(uri)[0]):
File “/opt/conda/lib/python3.8/site-packages/ray/ml/utils/remote_storage.py”, line 104, in get_fs_and_path
fs, path = pyarrow.fs.FileSystem.from_uri(uri)
File “pyarrow/_fs.pyx”, line 463, in pyarrow._fs.FileSystem.from_uri
File “pyarrow/error.pxi”, line 144, in pyarrow.lib.pyarrow_internal_check_status
File “pyarrow/error.pxi”, line 115, in pyarrow.lib.check_status
OSError: When resolving region for bucket ‘placeholder’: AWS Error [code 99]: curlCode: 7, Couldn’t connect to server
I am only using Persistent Volume but I have no Idea why I am getting hthis AWS error evethough I am not using any cloud storage.
For good understanding my tune.run is as follows
analysis = tune.run(
obj_fn,
local_dir="./results",
metric="score",
mode="max",
checkpoint_score_attr="score",
sync_config=tune.SyncConfig(
syncer=None
),
config={...},
search_alg=algo,
num_samples=num_samples,
verbose=Verbosity.V1_EXPERIMENT,
)
best_trial = analysis.get_best_trial(metric="score", mode="max")
logger.info("Best Checkpoint directory: \n{}\n".format(analysis.get_best_checkpoint(best_trial, metric="score", mode="max")))
On my personl laptop it is giving the path of best_checkpoint but when I am running on OpenShift error is raising. Please can anyone help with understanding the cause of the error and solutionn for it.