I’ve made some experiment on different parallelization packages including multiprocessing, billiard, and ray.
Both multiprocessing and billiard (a fork of multiprocessing) provide Pool object for mapping a function to a list of items. However, logs in the child processes cannot be sent to the log monitor, while those in the parent can. (I am using logging.info to track the status of my code in the child process) It maybe because logging object has a process lock which is shared between processes, which might causing deadlock and causing some child process unable to close successfully. Good news is that logging in the Actor of ray somehow does not have this problem.
I wonder what is the fundamental difference between a process and a ray actor. What is the benefit of using Ray against the traditional multiprocessing ways of python (e.g., using multiprocessing / os.fork …). Also, how ray handles the logging issue mentioned above?