newbie on ray so might be stupid question. Using ray 1.2.0 here.
I am trying to create a sklearn model from the example and deploy it on a gcp ray cluster.
gcp cluster is create using the default
ray/python/ray/autoscaler/gcp/example-full.yaml
And here is simple machine learning model using sklearn (called ray_serve.py) from the ray example
from ray import serve
import ray
import pickle
import requests
from sklearn.datasets import load_iris
from sklearn.ensemble import GradientBoostingClassifier
# Train model
iris_dataset = load_iris()
model = GradientBoostingClassifier()
model.fit(iris_dataset["data"], iris_dataset["target"])
class BoostingModel:
def __init__(self):
self.model = model
self.label_list = iris_dataset["target_names"].tolist()
async def __call__(self, starlette_request):
payload = await starlette_request.json()
print("Worker: received starlette request with data", payload)
input_vector = [
payload["sepal length"],
payload["sepal width"],
payload["petal length"],
payload["petal width"],
]
prediction = self.model.predict([input_vector])[0]
human_name = self.label_list[prediction]
return {"result": human_name}
if __name__ == '__main__':
ray.init(address='auto', _redis_password='5241590000000000')
# # listen on 0.0.0.0 to make the HTTP server accessible from other machines.
client = serve.start()
client.create_backend("lr:v1", BoostingModel, config=serve.BackendConfig(num_replicas=2))
client.create_endpoint("iris_classifier", backend="lr:v1", route="/regressor")
The code works fine locally and then is submitted to the live gcp ray cluster using
ray submit [gcp yaml] ray_serve.py
Then I this the following error
ray.serve.exceptions.RayServeException: Cannot scale backend to 1 replicas. Ray Serve tried to add 1 replicas but the resources only allows 0 to be added. To fix this, consider scaling to replica to 0 or add more resources to the cluster.
I’ve played around with couple of settings and still can’t seem to figure out.
For example, if i set the num_replicas=0 in BackendConfig, then the validator complaints
num_replicas
ensure this value is greater than 0 (type=value_error.number.not_gt; limit_value=0)
Anyone can provide some information on if I am doing the right approach to deploy a machine learning serve application on an existing cloud cluster?