Expose deployments

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hi everyone,

I am trying serve a very simple mnist model using ray serve on a google cloud machine.


I have a python file with the MNIST deployment logic. The file is more or less as follows:

class MNISTServer:

    def __init__(self):
        self.model = # code to fetch the model

    async def __call__(self, request):
        input_array = (await request.json())['array']
        input_array = torch.tensor(input_array)

        with torch.no_grad():
            preds = self.model(input_array).cpu().numpy()

        class_ = np.argmax(preds, axis=1)
        return {"class": class_}

mnist_app = MNISTServer.bind()

The, in the command line I do the following:

ray start --head
serve build ray_serve:mnist_app -o serve_config.yaml

Which yields the yaml file:

 This file was generated using the `serve build` command on Ray v2.6.3.

proxy_location: EveryNode



  port: 8000


- name: app1

  route_prefix: /

  import_path: ray_serve:mnist_app

  runtime_env: {}


  - name: MNISTServer

Then I do:

serve deploy serve_config.yaml 
2023-08-24 15:37:22,369 SUCC scripts.py:226 -- 
Sent deploy request successfully.
 * Use `serve status` to check applications' statuses.
 * Use `serve config` to see the current application config(s)

The issue is that, unlike what is written in the documentation:

host and port are HTTP options that determine the host IP address and the port for your Serve application’s HTTP proxies. These are optional settings and can be omitted. By default, the host will be set to to expose your deployments publicly, and the port will be set to 8000. If you’re using Kubernetes, setting host to is necessary to expose your deployments outside the cluster.

This is not exposing the deployment publicly. When in the vm machine I do:

>>> from PIL import Image
>>> import numpy as np
>>> import requests
>>> im = Image.open('8222.png')
>>> array = (np.asarray(im)/255).reshape((1, 28, 28)).tolist()
>>> print(requests.post('', json={'array': array}).text)
{"class": [4]}

Everything works fine. But if I try to do the same (but requesting to the public IP from my laptop) I get the error:

>>> print(requests.post('http://<public ip>:8000/').text)
TimeoutError: [Errno 110] Connection timed out

I made sure to allow http trafic in the VM and, since this is my first deployment, I am not sure what I could be missing

@augusto-peres Does this work on your laptop, just curious?

asking the Serve team cc: @shrekris @Sihan_Wang

This works both on the VM and my laptop when requesting to the localhost

I found out that port 8000 was not allowing http traffic. This was on port 80.

Setting a firewall rule on port 8000 to allow http traffic solves the problem.