How do I use ray serve with a remote ray cluster

I am trying to deploy a fast API service on AWS. I aim to use a CPU-only machine as the head node and all my workers as GPU machines. This configuration allows me to scale my GPU usage to 0 when not in demand.

I have deployed a cluster with the following Yaml

cluster_name: ray_test_2
max_workers: 1
upscaling_speed: 1.0
docker:
    container_name: "ray_container"
    pull_before_run: True
    head_image: "AWSECR/Image_1_CPU_based:latest"
    head_run_options:   # Extra options to pass into "docker run"
        - --ulimit nofile=65536:65536

    worker_image: "AWSECR/Image_2_GPU_based:latest"
    worker_run_options:
        - --ulimit nofile=65536:65536
        - --gpus all

idle_timeout_minutes: 10

provider:
    type: aws
    region: us-east-1
    cache_stopped_nodes: True # If not present, the default is True.

auth:
    ssh_user: ubuntu

available_node_types:
    ray.head.default:
        resources: {}
        node_config:
            InstanceType: t2.large
            ImageId: ami-053b0d53c279acc90
            BlockDeviceMappings:
                - DeviceName: /dev/sda1
                  Ebs:
                      VolumeSize: 150
                      VolumeType: gp3
    ray.worker.default:
        min_workers: 0
        max_workers: 1
        resources: {}
        node_config:
            InstanceType: g4dn.2xlarge
            ImageId: ami-05543abe7b00118ff
            BlockDeviceMappings:
                - DeviceName: /dev/sda1
                  Ebs:
                      VolumeSize: 150
                      VolumeType: gp3
            InstanceMarketOptions:
                MarketType: spot

head_node_type: ray.head.default

file_mounts: {
}

cluster_synced_files: []

file_mounts_sync_continuously: False

rsync_exclude:
    - "**/.git"
    - "**/.git/**"

rsync_filter:
    - ".gitignore"

initialization_commands:
    - sudo apt install awscli -y
    - aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin *********.dkr.ecr.us-east-1.amazonaws.com

setup_commands: []
head_setup_commands: []
worker_setup_commands: []
head_start_ray_commands:
    - ray stop
    - ray start --head --port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml --dashboard-host=0.0.0.0
worker_start_ray_commands:
    - ray stop
    - ray start --address=$RAY_HEAD_IP:6379 --object-manager-port=8076

When this server is up It creates only one **t2.large** instance as it should.

How do I now deploy my Fast API application using Ray serve on this cluster.

I am not able to connect my servefile.py (containing my ray serve model/app) with the cluster

This is my servefile.py

import ray
import requests
from ray import serve
from typing import Dict
from sss import base_run
from fastapi import FastAPI
from pydantic import BaseModel


class APIInput(BaseModel):
    s3_path: str = 's3://bucket/ABC.json'


ray.init(address='auto')

app = FastAPI()

@app.get("/")
def get_server_status():
    return {"Server Status": "ON"}


@app.post("/imagine/")
def root(item: APIInput):
    config = {'s3_path':item.s3_path}
    return {"output_path": base_run(config)}


@serve.deployment(
    autoscaling_config={
        "min_replicas": 0,
        "initial_replicas": 0,
        "max_replicas": 1
    },
    route_prefix="/", 
    ray_actor_options={
        "num_cpus": 3,
        "num_gpus": 1
        }
    )
@serve.ingress(app)
class FastAPIWrapper:
    pass

ray_app = FastAPIWrapper.bind()
serve.run(ray_app)

Any help is appreciated thanks