How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I created a grpc application via ray, but I noticed a bug due to which batching of messages after load testing blocks the method that is responsible for batching requests until the service is restarted.
I could not figure out what was wrong, so I decided to just run the second replica.
On the server 2 gpu,
I wanted to run 1 grpc service on each. But I can’t.
@serve.deployment(
is_driver_deployment=True,
name="whisper",
ray_actor_options={"num_cpus": 12},
num_replicas=2
)
class MyService(test_pb2_grpc.MyServiceServicer, gRPCIngress):
...
@serve.batch(max_batch_size=1, batch_wait_timeout_s=0.150)
async def handle_batch(
self, requests: RequestBatch
):
return ["ok" for i in requests]
async def Method(self, request, context: grpc.ServicerContext):
return await self.handle_batch(request)
I run it like this:
FROM rayproject/ray-ml:2.3.0-py38-gpu
....
CMD ldconfig \
&& OMP_NUM_THREADS=${OMP_NUM_THREADS} ray start --head \
--port=$RAY_HEAD_PORT \
# --include-dashboard=false \
--dashboard-host=0.0.0.0 \
--dashboard-port=$RAY_DASHBOARD_PORT \
&& serve run main:stt_deployment