Received a GOAWAY with error code ENHANCE_YOUR_CALM and debug data equal to "too_many_pings"

Any idea what could be causing this error?

Priority: Urgent

(BaseWorkerMixin pid=22821) Run configs dump to /redacted/run_config/2022-07-26T07:28:18Z/run.config
(BaseWorkerMixin pid=22821) <class 'torch.nn.parallel.distributed.DistributedDataParallel'> do not support net config
(BaseWorkerMixin pid=2804, ip=10.1.239.40) Run configs dump to /redacted//imagenet_detection/train_supernet_distributed/run_config/2022-07-26T07:29:12Z/run.config
(BaseWorkerMixin pid=2804, ip=10.1.239.40) <class 'torch.nn.parallel.distributed.DistributedDataParallel'> do not support net config
(BaseWorkerMixin pid=2804, ip=10.1.239.40) E0726 07:30:12.748119487    2837 chttp2_transport.cc:1103]   Received a GOAWAY with error code ENHANCE_YOUR_CALM and debug data equal to "too_many_pings"

When I run ray ray stack I get:

Stack dump for  stan     615666  0.6  0.1 7687148 153488 ?      R    11:42   0:03 ray::BaseWorkerMixin._BaseWorkerMixin__execute()
Process 615666: ray::BaseWorkerMixin._BaseWorkerMixin__execute()
Python v3.7.7 (/home/ray/anaconda3/bin/python3.7)

Error: failed to get os threadid


Stack dump for stan    615746  0.7  0.1 7747964 158280 ?      R    11:42   0:04 ray::BaseWorkerMixin._BaseWorkerMixin__execute()
Process 615746: ray::BaseWorkerMixin._BaseWorkerMixin__execute()
Python v3.7.7 (/home/ray/anaconda3/bin/python3.7)

Error: failed to get os threadid


Stack dump for  stan   615810  0.6  0.1 7682264 144316 ?      R    11:42   0:03 ray::BaseWorkerMixin._BaseWorkerMixin__execute()
Process 615810: ray::BaseWorkerMixin._BaseWorkerMixin__execute()
Python v3.7.7 (/home/ray/anaconda3/bin/python3.7)

Error: failed to get os threadid

Hi @stanleycelestin1,

Sorry for the late reply. We are actively fixing the issue now: [Core] Suppress gRPC server alerting on too many keep-alive pings by rickyyx · Pull Request #27769 · ray-project/ray · GitHub