Serve batching not working

Hello, I’m new in Ray, I am trying to deploy my model using Ray Serve
this is my inference code

@serve.deployment(num_replicas=1, ray_actor_options={"num_gpus": 0.3, "num_cpus": 1})
class InferObjectTRT(InferenceWrapper):
    # Model Parameters
    NTIMES_WARMUP = 6

    def __init__(self):
       self.loadmodel();

    def inference(self, image_batch):
        return self.infer(image_batch)

    async def handle_batch(self, images: List[np.ndarray]):
        print(f"Detect: Processing a total of {len(images)} images")
        images = np.concatenate(images, axis=0).astype(np.float32)
        predict= self.inference(images)
        return predict

    @serve.batch(max_batch_size=5, batch_wait_timeout_s=0)
    async def predict(self, image: np.ndarray):
        print("image in predict func", len(image))
        return await self.handle_batch(image)

and this is how I call

handler = InferObjectTRT()
results = await self.handler.predict.remote(img_detrz)

my problem is I had an input size is [9 x 418 x 418 x 3] but inside the function predict I printed out the length of input and I only got 1 for input length.

if I understand correctly, the input should split into two batches, [5 x 418 x 418 x 3] and [4 x 418 x 418 x 3] like this example.

Do I misunderstand and I did something incorrect here?

appreciate your help.