Modified global variable between consecutive remote calls

Hi,
I have a hard time understanding how ray deals with global variables.
In the following code how would be the right way to have in the second call to foo1 the update “a”? Do I need to pass the global variable as a parameter?

import ray
import time

a = 3

@ray.remote
def foo1(inp):
    time.sleep(10)
    return a + inp
    
def foo2(inp):
    time.sleep(10)
    return a + inp

def main():
    global a
    ray.init(include_dashboard=False, num_cpus=4)
    
    x1 = foo1.remote(0)
    a = 100
    print(foo2(2))
    x2 = foo1.remote(1)
    r1 = ray.get(x1)
    r2 = ray.get(x2)
    print(r1, r2)

if __name__ == "__main__":
    main()

My expected output was:
102
3 101

By I get:
102
3 4

Please refer to Ray Design Patterns - Google Docs for more details!

The problem of using global variable itself is that the global variable won’t be “shared” across processes!

Oh! I thought that ray was serializing the variables before the remote call.
If I’m not wrong, if “a” instead of being a global variable is a local variable, and I pass it as a parameter, then the worker is going to see the update. Also, is recommended to use “put” if I share variables across multiple workers to avoid multiple pickles.

In this case, is correct to use the lambda function? If instead of “i” I pass a more complex variable, do I need to do some previous transformation or ray just do one put in its own?

import ray
from ray import tune

def objective(config, alpha, checkpoint_dir=None):
    a = config["a"]
    b = config["b"]
    loss = a**alpha + b**alpha
    tune.report(loss=loss)


def main():
    ray.init(include_dashboard=False, num_cpus=4)
    
    for i in range(1, 5):
        config={
            "a": tune.randint(i*5, (i+1)*5),
            "b": tune.randint(i*5, (i+1)*5)
        }
        analysis = tune.run(
            lambda config, checkpoint_dir=None: objective(config, i, checkpoint_dir),
            name=f"alpha_{i}",
            num_samples=8,
            config=config,
            verbose=2
        )
        best_config = analysis.get_best_config(metric="loss", mode="min")

        print("Best config: ", best_config)
        print()
        print()

if __name__ == "__main__":
    main()

If you’d like to share the variable among tasks, you can call ray.put and pass the references to tasks. But note that Ray objects (objects created by ray.put or as a return value of ray tasks) are immutable.

Btw, I misunderstood your original question. You can totally pass the global variable as long as you don’t mutate the value.

The reason why foo1 and foo2 have 3 and 4 is because the function is already serialized and imported when python code reaches to that line. So, foo1 already is imported to workers with a = 3

1 Like