Ray Deserializes Function on Head In Addition to Worker

spolcyn · April 28, 2023, 9:16pm

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

We have the following situation running Ray using kuberay:

Local machine has up-to-date version of code
Using Ray Client, submits a function for parallel execution on Ray (the function is imported to where the Ray calls are made)
Ray Head has not been updated recently, so does not have that function present. Our Head defers all computations to Workers, which are created with the container containing the newest code, so execution always works on the Workers.
However, despite deferring all computation to Workers, a ModuleNotFoundError is thrown.

Our expectation was the execution would go like this:

Local machine cloudpickles the import path to the function we want to run, generating a BLOB
Ray stores a mapping between some UID and the BLOB in the GCS
The UID and relevant args are sent to Ray Worker pods
The Ray Worker pod gets the BLOB using the UID, then deserializes it locally, resolving the import path to actual code on the Ray worker pod.

However, it seems that the Ray Head is also deserializing the BLOB at some point, which fails the job because the Ray Head doesn’t have the new module on it yet.

Any ideas why this is from an architectural perspective, and is there a way for us to avoid any type of deserialization occurring on the Ray Head?

cc: @timothyngo

Example of what the relevant code would look like:

# run.py
from some_module import some_fn_parallel

# connect to ray
some_fn_parallel()

# some_module.py
@ray.remote
def some_fn():
    pass

def some_fn_parallel():
    ray.get([some_fn.remote(x) for x in args])

sangcho · May 1, 2023, 6:23pm

How ray client works is when you use the Ray API, it is proxied from a server running in a head node. So I assume deserialization happens there.

Is there a reason the head node has to have different env? It is not a well supported path (and we run 0 test for this scenario). If you’d like to setup environment at runtime, I recommend you to use Environment Dependencies — Ray 3.0.0.dev0 instead.

Jules_Damji · May 1, 2023, 8:30pm

@timothyngo As @sangcho pointed out, if you want environment dependencies than you can use
them per specific task/actor/job or cluster wide.

Let us know if that works for you.

spolcyn · May 1, 2023, 10:47pm

There isn’t a particular reason that the head node must have a different environment, but our code deploys happen frequently, and we don’t currently know of a clear story for doing rolling upgrades of a k8s-based Ray cluster (i.e., upgrading w/o losing jobs).

We can get around it with runtime environments, we were just surprised it was necessary based on how we thought the serialization operated.

Topic		Replies	Views
App instance as a worker Ray Core	5	344	May 30, 2023
Worker node unable to retrieve object Ray Core	2	542	November 30, 2022
Serving Ray on Kubernetes from Another App	5	622	August 4, 2021
Problems lauching gcp cluster Ray Core	4	727	July 7, 2022
Pods usage in the Ray cluster Kubernetes	2	652	March 24, 2021

Ray Deserializes Function on Head In Addition to Worker

Related topics