Just had some quick questions about the difference between running ray through separate processes with remote functions and running it through the multiprocessing pool function. Why does multiprocessing pool not use remote functions? Also, I was wondering why multiprocessing pool only allows me to run as many processes as the number of total cpus across allocated cluster nodes while I can spin up hundreds of processes in a for loop with calls to a remote function?
They’re pretty much the same! The Ray’s multiprocessing API is really just a thin wrapper around remote functions/actors.
The main reason we have the multiprocessing API is just to make it super easy to move your multiprocessing code over.
If you’re not super tied to the multiprocessing API, you could also consider using a ray utility like the actor pool instead: Using Actors — Ray v2.0.0.dev0