I have opened auto scaler in serve deployment.
I have an input with (batch size )* 512*512
I want to split the batch size equally and put them into handle.infer.remote(), so my request will run on multiple replicas paramlismly.
So my question is how to get the replica number from serve deployment handle in realtime?
def init(self, sd_model_handles) → None:
for key, handle in self.sd_handles.items():
print (“key:”, key)
# I need the number of replicas
print (“handle”, handle.number_of_replica)
for model_name in model_name_list:
fastapi_deployment = FastAPIDeployment.bind(sd_handles)