I don’t immediately see anything wrong with the setup. You shouldn’t need to “close” the handles or anything like that. The long poll client timeouts and the replica response latency errors are concerning so I would be surprised if it’s not related.
Do you see any errors in the serve controller logs (/tmp/ray/session_latest/logs/controller_<pid>.log
)? Also, what does the CPU utilization look like? It could be that the Python processes have very high CPU contention.
I’d suggest that you file a GitHub issue so we can track the issue and discuss it there. Please also provide any additional logs and details about your setup (such as the Ray version).