[High] How to raise a second grpc replica on the same server?

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I created a grpc application via ray, but I noticed a bug due to which batching of messages after load testing blocks the method that is responsible for batching requests until the service is restarted.
I could not figure out what was wrong, so I decided to just run the second replica.
On the server 2 gpu,
I wanted to run 1 grpc service on each. But I can’t.

@serve.deployment(
    is_driver_deployment=True,
    name="whisper",
    ray_actor_options={"num_cpus": 12},
    num_replicas=2
)
class MyService(test_pb2_grpc.MyServiceServicer, gRPCIngress):
   ...

    @serve.batch(max_batch_size=1, batch_wait_timeout_s=0.150)
    async def handle_batch(
        self, requests: RequestBatch
    ):
    return ["ok" for i in requests]

    async def Method(self, request, context: grpc.ServicerContext):
        return await self.handle_batch(request)

I run it like this:

FROM rayproject/ray-ml:2.3.0-py38-gpu
....
CMD ldconfig \
    && OMP_NUM_THREADS=${OMP_NUM_THREADS} ray start --head \
    --port=$RAY_HEAD_PORT \
    # --include-dashboard=false \
    --dashboard-host=0.0.0.0 \
    --dashboard-port=$RAY_DASHBOARD_PORT \
    && serve run main:stt_deployment

what’s the error message you see?
the question is a bit confusing, are you trying to run 2 gRPC applications on a single cluster, or you are trying to run 2 clusters, each with a gRPC application?

I’m trying to run 2 gRPC applications on the single cluster using num_replicas.
But nothing comes out. I had to raise 2 head ray clusters (ray start --head && serve run ... in two docker-containers) with different dashboards in order to deploy 2 replicas of the same application.