No I did not. I mean, run ray start once, run my training script (corresponds to a set of hyperparams). Then run my training script a second time with another set of hyperparams in a new terminal window @sangcho
Traceback (most recent call last):
File "bandu/torch/train_ray_simple.py", line 841, in <module>
tune_bandu(num_workers=args.num_workers, use_gpu=args.use_gpu) # If number of GPUs does not match the EC2 GPUs, we may get an error...
File "bandu/torch/train_ray_simple.py", line 747, in tune_bandu
queue_trials=True
File "/home/richard/improbable/venvs/minimum_bandu_venv/lib/python3.6/site-packages/ray/tune/tune.py", line 377, in run
metric=metric)
File "/home/richard/improbable/venvs/minimum_bandu_venv/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 140, in __init__
self._server = TuneServer(self, self._server_port)
File "/home/richard/improbable/venvs/minimum_bandu_venv/lib/python3.6/site-packages/ray/tune/web_server.py", line 243, in __init__
self._server = HTTPServer(address, RunnerHandler(runner))
File "/usr/lib/python3.6/socketserver.py", line 456, in __init__
self.server_bind()
File "/usr/lib/python3.6/http/server.py", line 136, in server_bind
socketserver.TCPServer.server_bind(self)
File "/usr/lib/python3.6/socketserver.py", line 470, in server_bind
self.socket.bind(self.server_address)
OSError: [Errno 98] Address already in use
Actually you can’t randomize the port. this happens:
2021-02-18 21:55:37,843 INFO web_server.py:242 -- Starting Tune Server...
500 response executing GraphQL.
{"error":"Error 1040: Too many connections"}
500 response executing GraphQL.
{"error":"Error 1040: Too many connections"}
500 response executing GraphQL.
{"error":"Error 1040: Too many connections"}