So, I checked As you suggested, Played around with the number in upscale_delay_s
, and it seems like the max number that I am writing in the config, Ray is trying to always keep less than that, For example when I wrote max_replicas and max_concurrent_queries as 8, I was able to see 4 replicas and 4 request getting executed at the same time, but that’s not the case when I write 5, also
Here is my current config
@serve.deployment(ray_actor_options={"num_gpus": 1},
max_concurrent_queries=5,
autoscaling_config={
"target_num_ongoing_requests_per_replica": 1,
"min_replicas": 0,
"initial_replicas": 2,
"max_replicas": 5,
"upscale_delay_s": 0.1,
"downscale_delay_s": 10
})
Still, even though I am providing it with 5 replicas, it only creates 3 replicas, Am I doing something wrong?
For hitting the API I am using this code
import asyncio
import aiohttp
import datetime
async def send_req(image):
image_path = f"storage/{image}"
async with aiohttp.ClientSession() as session:
response = await session.get("URL of API", data=image_path)
result = await response.text()
now = datetime.datetime.now()
print(f"Result for {image}: {result}", now.minute, now.second)
async def main():
#images = ["image1.jpg", "image2.jpg", "image3.jpg", "image4.jpg", "image5.jpg", "image6.jpg", "image7.jpg", "image8.jpg", "image9.jpg", "image10.jpg"]
tasks = [send_req(i) for i in image]
await asyncio.gather(*tasks)