Deploy sklearn machine learning model to ray cluster on gcp

newbie on ray so might be stupid question. Using ray 1.2.0 here.
I am trying to create a sklearn model from the example and deploy it on a gcp ray cluster.
gcp cluster is create using the default
ray/python/ray/autoscaler/gcp/example-full.yaml

And here is simple machine learning model using sklearn (called ray_serve.py) from the ray example

from ray import serve
import ray
import pickle
import requests
from sklearn.datasets import load_iris
from sklearn.ensemble import GradientBoostingClassifier

# Train model
iris_dataset = load_iris()
model = GradientBoostingClassifier()
model.fit(iris_dataset["data"], iris_dataset["target"])


class BoostingModel:
    def __init__(self):
        self.model = model
        self.label_list = iris_dataset["target_names"].tolist()

    async def __call__(self, starlette_request):
        payload = await starlette_request.json()
        print("Worker: received starlette request with data", payload)

        input_vector = [
            payload["sepal length"],
            payload["sepal width"],
            payload["petal length"],
            payload["petal width"],
        ]
        prediction = self.model.predict([input_vector])[0]
        human_name = self.label_list[prediction]
        return {"result": human_name}


if __name__ == '__main__':
    ray.init(address='auto', _redis_password='5241590000000000')
    # # listen on 0.0.0.0 to make the HTTP server accessible from other machines.
    client = serve.start()
    client.create_backend("lr:v1", BoostingModel, config=serve.BackendConfig(num_replicas=2))
    client.create_endpoint("iris_classifier", backend="lr:v1", route="/regressor")

The code works fine locally and then is submitted to the live gcp ray cluster using

ray submit [gcp yaml] ray_serve.py

Then I this the following error

ray.serve.exceptions.RayServeException: Cannot scale backend to 1 replicas. Ray Serve tried to add 1 replicas but the resources only allows 0 to be added. To fix this, consider scaling to replica to 0 or add more resources to the cluster.

I’ve played around with couple of settings and still can’t seem to figure out.
For example, if i set the num_replicas=0 in BackendConfig, then the validator complaints

num_replicas
ensure this value is greater than 0 (type=value_error.number.not_gt; limit_value=0)

Anyone can provide some information on if I am doing the right approach to deploy a machine learning serve application on an existing cloud cluster?

It seems like the issue is there aren’t enough CPUs available on the cluster. Ray Serve 1.2.0 by default uses 2 CPUs internally, so if you’re on a single machine with two cores, that would explain why there aren’t enough resources to start a backend replica.

You can see how many CPUs are being used by checking the Ray Dashboard or by comparing ray.available_resources() and ray.cluster_resources().

Another option is to replace client = serve.start() with client = serve.start(http_options={"num_cpus": 0}), which will not reserve a CPU for the internal Serve controller, thus freeing up a CPU for one replica. The error message has been improved for Ray 1.3, sorry it asked you to set num_replicas=0!

Thank you vm for the reply @architkulkarni and as you’ve suggested, I was able to get it to work by increase the size of node and number of cpus for the resources

# The resources provided by this node type.
resources: {"CPU": 8}

Then once the serve is running on the head node i.e.
ray submit ray_gcp.yaml ray_serve.py

I did

ray attach ray_gcp.yaml

Once ssh process is attached, I launched python and run the following

sample_request_input = {
    "sepal length": 1.2,
    "sepal width": 1.0,
    "petal length": 1.1,
    "petal width": 0.9,
}
response = requests.get(
    "http://localhost:8000/regressor", json=sample_request_input)
print(response.text)

However the following error is returned

ConnectionError: HTTPConnectionPool(host=‘localhost’, port=8000): Max retries exceeded with url: /regressor (Caused by NewConnectionError(‘<urllib3.connection.HTTPConnection object at 0x7fb288df4450>: Failed to establish a new connection: [Errno 111] Connection refused’))

I’ve tried changing localhost to 0.0.0.0 on both the serve application and request and still the same error.

Now if I change the url of the GET request to external ip of head node, then I get a connection timed out error

ConnectionError: HTTPConnectionPool(host=[head node external id], port=8000): Max retries exceeded with url: /regressor (Caused by NewConnectionError(‘<urllib3.connection.HTTPConnection object at 0x7fb756f6cad0>: Failed to establish a new connection: [Errno 110] Connection timed out’))

Am I missing something here? Also if I would like to access the serve application outside of the cluster, what’s the best way to do it, would be lovely if there’s a walkthrough of the setup on serve deployment on cluster and how to request it.

Thanks again.

Ah, the issue might be that when your ray_serve.py script exits, the Ray Serve client goes out of scope, so Ray Serve shuts down. Can you try changing client = serve.start() to client = serve.start(detached=True)? That will make the Serve instance persist even after ray_serve.py exits.