Hi!
Had similar problem before with everything else : My previous post which was fixed by nightly build.
So now is a turn for Ray Tune, for which following examples of getting started to test on cluster gives me multiple errors. Perhaps I am missing something out here… .
Thus for this basic code:
from ray import tune
# 1. Define an objective function.
def objective(config):
score = config["a"] ** 2 + config["b"]
return {"score": score}
# 2. Define a search space.
search_space = {
"a": tune.grid_search([0.001, 0.01, 0.1, 1.0]),
"b": tune.choice([1, 2, 3]),
}
# 3. Start a Tune run and print the best result.
analysis = tune.run(objective, config=search_space,
local_dir=local_dir)
print(analysis.get_best_config(metric="score", mode="min"))
It starts and runs one objective and then I get this error (the puid is the number from the machine i try to execute the code):
RayTaskError(TuneError): ray::run() (pid=15453, ip=10.99.11.75)
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 877, in _on_training_result
self._process_trial_results(trial, result)
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 961, in _process_trial_results
decision = self._process_trial_result(trial, result)
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 1016, in _process_trial_result
self._callbacks.on_trial_result(
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/tune/callback.py", line 268, in on_trial_result
callback.on_trial_result(**info)
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/tune/syncer.py", line 577, in on_trial_result
trial_syncer.sync_down_if_needed()
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/tune/syncer.py", line 352, in sync_down_if_needed
return super(NodeSyncer, self).sync_down_if_needed(SYNC_PERIOD, exclude=exclude)
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/tune/syncer.py", line 237, in sync_down_if_needed
self.sync_down(exclude)
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/tune/syncer.py", line 374, in sync_down
logger.debug("Syncing from %s to %s", self._remote_path, self._local_dir)
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/tune/syncer.py", line 379, in _remote_path
ssh_user = get_ssh_user()
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/tune/cluster_info.py", line 19, in get_ssh_user
return getpass.getuser()
File "/home/ray/anaconda3/lib/python3.8/getpass.py", line 169, in getuser
return pwd.getpwuid(os.getuid())[0]
KeyError: 'getpwuid(): uid not found: 12574'
During handling of the above exception, another exception occurred:
Many thanks for help