Deploy sklearn machine learning model to ray cluster on gcp

newbie on ray so might be stupid question. Using ray 1.2.0 here.
I am trying to create a sklearn model from the example and deploy it on a gcp ray cluster.
gcp cluster is create using the default

And here is simple machine learning model using sklearn (called from the ray example

from ray import serve
import ray
import pickle
import requests
from sklearn.datasets import load_iris
from sklearn.ensemble import GradientBoostingClassifier

# Train model
iris_dataset = load_iris()
model = GradientBoostingClassifier()["data"], iris_dataset["target"])

class BoostingModel:
    def __init__(self):
        self.model = model
        self.label_list = iris_dataset["target_names"].tolist()

    async def __call__(self, starlette_request):
        payload = await starlette_request.json()
        print("Worker: received starlette request with data", payload)

        input_vector = [
            payload["sepal length"],
            payload["sepal width"],
            payload["petal length"],
            payload["petal width"],
        prediction = self.model.predict([input_vector])[0]
        human_name = self.label_list[prediction]
        return {"result": human_name}

if __name__ == '__main__':
    ray.init(address='auto', _redis_password='5241590000000000')
    # # listen on to make the HTTP server accessible from other machines.
    client = serve.start()
    client.create_backend("lr:v1", BoostingModel, config=serve.BackendConfig(num_replicas=2))
    client.create_endpoint("iris_classifier", backend="lr:v1", route="/regressor")

The code works fine locally and then is submitted to the live gcp ray cluster using

ray submit [gcp yaml]

Then I this the following error

ray.serve.exceptions.RayServeException: Cannot scale backend [] to 1 replicas. Ray Serve tried to add 1 replicas but the resources only allows 0 to be added. To fix this, consider scaling to replica to 0 or add more resources to the cluster.

I’ve played around with couple of settings and still can’t seem to figure out.
For example, if i set the num_replicas=0 in BackendConfig, then the validator complaints

ensure this value is greater than 0 (type=value_error.number.not_gt; limit_value=0)

Anyone can provide some information on if I am doing the right approach to deploy a machine learning serve application on an existing cloud cluster?

It seems like the issue is there aren’t enough CPUs available on the cluster. Ray Serve 1.2.0 by default uses 2 CPUs internally, so if you’re on a single machine with two cores, that would explain why there aren’t enough resources to start a backend replica.

You can see how many CPUs are being used by checking the Ray Dashboard or by comparing ray.available_resources() and ray.cluster_resources().

Another option is to replace client = serve.start() with client = serve.start(http_options={"num_cpus": 0}), which will not reserve a CPU for the internal Serve controller, thus freeing up a CPU for one replica. The error message has been improved for Ray 1.3, sorry it asked you to set num_replicas=0!

Thank you vm for the reply @architkulkarni and as you’ve suggested, I was able to get it to work by increase the size of node and number of cpus for the resources

# The resources provided by this node type.
resources: {"CPU": 8}

Then once the serve is running on the head node i.e.
ray submit ray_gcp.yaml

I did

ray attach ray_gcp.yaml

Once ssh process is attached, I launched python and run the following

sample_request_input = {
    "sepal length": 1.2,
    "sepal width": 1.0,
    "petal length": 1.1,
    "petal width": 0.9,
response = requests.get(
    "http://localhost:8000/regressor", json=sample_request_input)

However the following error is returned

ConnectionError: HTTPConnectionPool(host=‘localhost’, port=8000): Max retries exceeded with url: /regressor (Caused by NewConnectionError(’<urllib3.connection.HTTPConnection object at 0x7fb288df4450>: Failed to establish a new connection: [Errno 111] Connection refused’))

I’ve tried changing localhost to on both the serve application and request and still the same error.

Now if I change the url of the GET request to external ip of head node, then I get a connection timed out error

ConnectionError: HTTPConnectionPool(host=[head node external id], port=8000): Max retries exceeded with url: /regressor (Caused by NewConnectionError(’<urllib3.connection.HTTPConnection object at 0x7fb756f6cad0>: Failed to establish a new connection: [Errno 110] Connection timed out’))

Am I missing something here? Also if I would like to access the serve application outside of the cluster, what’s the best way to do it, would be lovely if there’s a walkthrough of the setup on serve deployment on cluster and how to request it.

Thanks again.

Ah, the issue might be that when your script exits, the Ray Serve client goes out of scope, so Ray Serve shuts down. Can you try changing client = serve.start() to client = serve.start(detached=True)? That will make the Serve instance persist even after exits.