[I put this topic under “Ray Tune” as many Ray users that I know of started their Ray journey with that library. Feel free to change the category.]
People from various communities are using Ray for different purposes, ranging from Deep Learning model training/tuning [rllib, tune, raysgd], model serving [serve] to customized distributed application from scratch [core]. There’s a noticeable trend that applies Ray to large-scale data processing as well. That attracts more data scientists’ attention. While scientists typically start their work on a Jupyter Notebook, the onboarding experience is unfortunately unpleasant sometimes, partially due to lack of documentation. I’m starting this thread to summarize the best practices that I gathered so far. Welcome any comments or add-ons.
The discussion assumes AWS ecosystem is been used but the arguments themselves should be generic.