Ray: 1.4
When creating and deploying a model such as
SomeRayImageModel.options(init_args=model_args, num_replicas=NUM_REPS, name=model_id).deploy()
what is a “best practice”/suggestion for error handling? For example, lets say that the “SomeRayImageModel” is initializing the model as part of its creation and there is an exception (e.g., using pytorch and the model download fails)
In my specific, I see the exception is thrown in the init, and the ray serve framework reports
2021-07-22 09:09:53.566 | ERROR | app.inference.pytorch.base_image_model:__init__:20 - OOPS...exception during model creation <urlopen error [Errno 8] nodename nor servname provided, or not known>
(pid=13782) 2021-07-22 09:09:53,571 ERROR worker.py:418 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::HrTJVO:SERVE_CONTROLLER_ACTOR:wide_resnet50_2#QZFjIF:RayServeWrappedReplica.__init__ (pid=13782, ip=127.0.0.1)
(pid=13782) File "/Users/developer/.pyenv/versions/3.8.6/lib/python3.8/http/client.py", line 1255, in request
(pid=13782) self._send_request(method, url, body, headers, encode_chunked)
and then seems to get in a loop where it continues to try to create/deploy the same model.
Any suggestions wrt error handling, ray serve, and creating/deploying models?
Thanks