Hello, I’m new in Ray, I am trying to deploy my model using Ray Serve
this is my inference code
@serve.deployment(num_replicas=1, ray_actor_options={"num_gpus": 0.3, "num_cpus": 1})
class InferObjectTRT(InferenceWrapper):
# Model Parameters
NTIMES_WARMUP = 6
def __init__(self):
self.loadmodel();
def inference(self, image_batch):
return self.infer(image_batch)
async def handle_batch(self, images: List[np.ndarray]):
print(f"Detect: Processing a total of {len(images)} images")
images = np.concatenate(images, axis=0).astype(np.float32)
predict= self.inference(images)
return predict
@serve.batch(max_batch_size=5, batch_wait_timeout_s=0)
async def predict(self, image: np.ndarray):
print("image in predict func", len(image))
return await self.handle_batch(image)
and this is how I call
handler = InferObjectTRT()
results = await self.handler.predict.remote(img_detrz)
my problem is I had an input size is [9 x 418 x 418 x 3] but inside the function predict
I printed out the length of input and I only got 1 for input length.
if I understand correctly, the input should split into two batches, [5 x 418 x 418 x 3] and [4 x 418 x 418 x 3] like this example.
Do I misunderstand and I did something incorrect here?
appreciate your help.