I am trying to deploy a fast API service on AWS. I aim to use a CPU-only machine as the head node and all my workers as GPU machines. This configuration allows me to scale my GPU usage to 0 when not in demand.
I have deployed a cluster with the following Yaml
cluster_name: ray_test_2
max_workers: 1
upscaling_speed: 1.0
docker:
container_name: "ray_container"
pull_before_run: True
head_image: "AWSECR/Image_1_CPU_based:latest"
head_run_options: # Extra options to pass into "docker run"
- --ulimit nofile=65536:65536
worker_image: "AWSECR/Image_2_GPU_based:latest"
worker_run_options:
- --ulimit nofile=65536:65536
- --gpus all
idle_timeout_minutes: 10
provider:
type: aws
region: us-east-1
cache_stopped_nodes: True # If not present, the default is True.
auth:
ssh_user: ubuntu
available_node_types:
ray.head.default:
resources: {}
node_config:
InstanceType: t2.large
ImageId: ami-053b0d53c279acc90
BlockDeviceMappings:
- DeviceName: /dev/sda1
Ebs:
VolumeSize: 150
VolumeType: gp3
ray.worker.default:
min_workers: 0
max_workers: 1
resources: {}
node_config:
InstanceType: g4dn.2xlarge
ImageId: ami-05543abe7b00118ff
BlockDeviceMappings:
- DeviceName: /dev/sda1
Ebs:
VolumeSize: 150
VolumeType: gp3
InstanceMarketOptions:
MarketType: spot
head_node_type: ray.head.default
file_mounts: {
}
cluster_synced_files: []
file_mounts_sync_continuously: False
rsync_exclude:
- "**/.git"
- "**/.git/**"
rsync_filter:
- ".gitignore"
initialization_commands:
- sudo apt install awscli -y
- aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin *********.dkr.ecr.us-east-1.amazonaws.com
setup_commands: []
head_setup_commands: []
worker_setup_commands: []
head_start_ray_commands:
- ray stop
- ray start --head --port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml --dashboard-host=0.0.0.0
worker_start_ray_commands:
- ray stop
- ray start --address=$RAY_HEAD_IP:6379 --object-manager-port=8076
When this server is up It creates only one **t2.large**
instance as it should.
How do I now deploy my Fast API application using Ray serve on this cluster.
I am not able to connect my servefile.py (containing my ray serve model/app) with the cluster
This is my servefile.py
import ray
import requests
from ray import serve
from typing import Dict
from sss import base_run
from fastapi import FastAPI
from pydantic import BaseModel
class APIInput(BaseModel):
s3_path: str = 's3://bucket/ABC.json'
ray.init(address='auto')
app = FastAPI()
@app.get("/")
def get_server_status():
return {"Server Status": "ON"}
@app.post("/imagine/")
def root(item: APIInput):
config = {'s3_path':item.s3_path}
return {"output_path": base_run(config)}
@serve.deployment(
autoscaling_config={
"min_replicas": 0,
"initial_replicas": 0,
"max_replicas": 1
},
route_prefix="/",
ray_actor_options={
"num_cpus": 3,
"num_gpus": 1
}
)
@serve.ingress(app)
class FastAPIWrapper:
pass
ray_app = FastAPIWrapper.bind()
serve.run(ray_app)
Any help is appreciated thanks