ConnectionError: Cannot send request due to data channel shutting down

Been successfully using the ray client and now in the process of stress testing Ray.

After we attempt to open up 75+ connections (all from separate containers), we begin to get the following error back when trying to connect:

“ConnectionError: Cannot send request due to data channel shutting down.”

Has anyone encountered this before? Are there connection limits built into the client? Any help would be appreciated!

1 Like

In case anyone encounters this error in the future, the Ray client has a maximum number of threads set by default. I adjusted the RAY_CLIENT_SERVER_MAX_THREADS environment accordingly.

Hey @samrogers226 could you say more about your use case?

I implemented that env var and didn’t realize there’d be so many connections!

@rliaw, absolutely.

A little bit of broad context, we store all of our data in parquet and users want to be able to access it (several million rows returned in under 10s). Not really a big data problem. We’re experimenting with Ray to serve as our data access layer that reads in & transforms partitions and eventually returns back to user.

We’ve set up an ECS cluster that is running ray (default Ray cluster won’t quite serve oru needs for a few reasons). We’ve seen good success using Ray so far, however, we want to test what this will look like under more realistic circumstances (100s of users requesting data simultaneously). So we’ve created a job that opens up connections via client and executes jobs.

Slightly unrelated but the place where we’re seeing bottlenecks is at the ray head now. It seems slow to return results when working at scale. We’re quite sure that this isn’t a matter of resource constraints, are you aware of any parameters/configurations here that might be worth exploring?

Hmm yeah, having the ray client server be the single entry point to the cluster may be an issue.

Maybe you could try using Ray Serve? cc @simon-mo @eoakes

We have also encountered this problem recently. Our use case is an analytics web app that our users can run analytics powered by Ray cluster. Is there any information on how scalable is Ray Client/Head for simultaneous connections?

cc @ijrsvt to take a look here