ConnectionError: Cannot send request due to data channel shutting down

samrogers226 · April 27, 2021, 2:42am

Been successfully using the ray client and now in the process of stress testing Ray.

After we attempt to open up 75+ connections (all from separate containers), we begin to get the following error back when trying to connect:

“ConnectionError: Cannot send request due to data channel shutting down.”

Has anyone encountered this before? Are there connection limits built into the client? Any help would be appreciated!

samrogers226 · April 28, 2021, 12:34pm

In case anyone encounters this error in the future, the Ray client has a maximum number of threads set by default. I adjusted the RAY_CLIENT_SERVER_MAX_THREADS environment accordingly.

rliaw · April 29, 2021, 6:00am

Hey @samrogers226 could you say more about your use case?

I implemented that env var and didn’t realize there’d be so many connections!

samrogers226 · April 29, 2021, 4:34pm

@rliaw, absolutely.

A little bit of broad context, we store all of our data in parquet and users want to be able to access it (several million rows returned in under 10s). Not really a big data problem. We’re experimenting with Ray to serve as our data access layer that reads in & transforms partitions and eventually returns back to user.

We’ve set up an ECS cluster that is running ray (default Ray cluster won’t quite serve oru needs for a few reasons). We’ve seen good success using Ray so far, however, we want to test what this will look like under more realistic circumstances (100s of users requesting data simultaneously). So we’ve created a job that opens up connections via client and executes jobs.

Slightly unrelated but the place where we’re seeing bottlenecks is at the ray head now. It seems slow to return results when working at scale. We’re quite sure that this isn’t a matter of resource constraints, are you aware of any parameters/configurations here that might be worth exploring?

rliaw · April 29, 2021, 7:58pm

Hmm yeah, having the ray client server be the single entry point to the cluster may be an issue.

Maybe you could try using Ray Serve? cc @simon-mo @eoakes

Trung_Huynh · July 1, 2021, 7:18pm

We have also encountered this problem recently. Our use case is an analytics web app that our users can run analytics powered by Ray cluster. Is there any information on how scalable is Ray Client/Head for simultaneous connections?

rliaw · July 12, 2021, 7:20pm

cc @ijrsvt to take a look here

samrogers226 · August 13, 2021, 9:25pm

@Trung_Huynh - bit of a late response here but the approach that we’ve ultimately taken is to create multiple ray clusters and putting them behind a single load balancer. Has been immensely helpful in dealing with some of the limitations we were seeing in running Ray with many client connections.

Topic		Replies	Views
Something went wrong when I used remote ray cluster	0	409	June 21, 2023
Ray client fails to reconnect	2	728	November 18, 2021
Problem connecting client to cluster Ray Core	4	1087	April 30, 2024
Limiting bandwidth Ray Core	0	164	February 15, 2024
Ray Client Max Connections Ray Client	1	385	September 27, 2023

ConnectionError: Cannot send request due to data channel shutting down

Related topics