It would be really great to have some better documentation on how to use Ray for production workloads. The Ray programming model is well suited to build production systems like web services or serving systems, however at the moment there is no good centralized place where we collect documentation about that and give insights into how to actually build and run such production systems (some existing docs include Ray Monitoring — Ray v2.0.0.dev0 and Ray Serve: Scalable and Programmable Serving — Ray v2.0.0.dev0).
Here are some concrete ideas for improvements we could make along these lines:
-
What are good programming patterns for production applications? Ideally this would expand Ray Design Patterns - Google Docs and emphasize the production pattern there (e.g. there would be a section in this doc that links them for somebody to find them quickly).
-
How to run Ray production applications on k8s, how to do monitoring and health checks, how to handle out of memory situations? How to debug out of memory problems?
-
Have one or more example application for which we show end-to-end how deployment and monitoring is done (similar to what GitHub - tiangolo/full-stack-fastapi-postgresql: Full stack, modern web application generator. Using FastAPI, PostgreSQL as database, Docker, automatic HTTPS and more. does for fastAPI).
-
Describe some case studies on how companies use Ray in production, how they do their deployment and how they deal with the challenges mentioned above.
These are some ideas and we’d love to hear from people using Ray in production on what their pain points are/where more documentation would help.