I need to run some application where I need multi processing, essentially I have either some deep learning training, either just some inference, either some text processing, which run at the moment in some web services. These services run in the same kubernetes cluster. Now I need to expose some http endpoints per each one of these services, but the endpoints need not to be blocked by a long running processing (up to 30 minutes). I though so to run the long run processing on a different process than the one the http server runs. I had several possibilities but we are experimenting with having a fastAPI app, served by gunicorn, and the app will also init ray in local mode and will run the long process in a remote agent. IN future we will need also to optimize the process and we could use multiple agents, but for now we just need one agent and one fastAPI app. This seems to work very well, but my concern is that we have several services, running each one on a pod in the same k8s cluster. Running 10 services each with 3 instances means to have 30 pods each one with a local mode ray. Is this gonna cause any trouble? The alternative would be to run two processes with the standard python API, but we would loose lot of benefits for parallel processing. Do you think we are gonna have probelms with this setup?
cc @Dmitri Can you follow up with him?
There’s no problem with using single-node Ray for multiprocessing in a K8s pod.