Sharing function closure with other nodes

Let’s say I have a ray cluster with multiple nodes connected via network. If I define a remote function on one node, all the variables in its closure will be serialized and sent to the other nodes (correct me if I’m wrong?). That is, for this code:

import ray
from my_file import external_foo

def uses_external(x):
    return external_foo(x+1)

@ray.remote
def task(arg):
    return 2*uses_external(arg+2)

The dependency of the remote function task() is uses_external() which in turn has external_foo() as its own dependency.

My question is: how does synchronization of these dependencies between ray nodes work? How does ray know that it needs to send uses_external to other nodes, as well as external_foo and it’s dependencies?

Hi @Danil_Lykov,

Welcome to the Ray community! We use cloudpickle(GitHub - cloudpipe/cloudpickle: Extended pickling support for Python objects) to serialize functions and it will handle all dependencies automatically.