Question about logging.info in ray actor

I’ve made some experiment on different parallelization packages including multiprocessing, billiard, and ray.

Both multiprocessing and billiard (a fork of multiprocessing) provide Pool object for mapping a function to a list of items. However, logs in the child processes cannot be sent to the log monitor, while those in the parent can. (I am using logging.info to track the status of my code in the child process) It maybe because logging object has a process lock which is shared between processes, which might causing deadlock and causing some child process unable to close successfully. Good news is that logging in the Actor of ray somehow does not have this problem.

I wonder what is the fundamental difference between a process and a ray actor. What is the benefit of using Ray against the traditional multiprocessing ways of python (e.g., using multiprocessing / os.fork …). Also, how ray handles the logging issue mentioned above?

Hey @jeffrey82221, good questions.

I wonder what is the fundamental difference between a process and a ray actor.

They’re really quite similar. Ray launches each actor in a separate process (tasks are also run in separate processes).

What is the benefit of using Ray against the traditional multiprocessing ways of python (e.g., using multiprocessing / os.fork …).

One advantage is that Ray can scale out to multiple machines. This includes the ray.util.multiprocessing module. Another is that you can use build a larger distributed application using Ray tasks/actors/libraries, whereas usually multiprocessing is parallelizing a single computation.

Regarding the logging issue, Ray pipes all the stdout/stderr of its worker processes to the driver by default (unless you set ray.init(log_to_driver=False)), which might explain the difference.