Let’s say I have a ray cluster with multiple nodes connected via network. If I define a remote function on one node, all the variables in its closure will be serialized and sent to the other nodes (correct me if I’m wrong?). That is, for this code:
import ray
from my_file import external_foo
def uses_external(x):
return external_foo(x+1)
@ray.remote
def task(arg):
return 2*uses_external(arg+2)
The dependency of the remote function task() is uses_external() which in turn has external_foo() as its own dependency.
My question is: how does synchronization of these dependencies between ray nodes work? How does ray know that it needs to send uses_external to other nodes, as well as external_foo and it’s dependencies?