Is it possible to share objects between different driver processes?

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi Ray team,

I am new to Ray and I used to use Dask.

In Dask, there is a feature called Publish Datasets. By publishing a dataset, it is possible to read some data, perform some calculation, and share the result with other colleagues through a named reference. Published datasets continue to reside in distributed memory even after all clients requesting them have disconnected.

I would like to know whether there is a similar feature in Ray as well, i.e, is there a way to share objects/datasets between different driver processes?

Thanks!

hi @Arsenal591, welcome to the community!

Sorry, we are a bit busy working on Ray 2.0 release so the reply is a little bit delayed.

For your question, yes, I believe this is totally doable! I think you need a detached actor to do the job.

here is a simple example:

import ray
import argparse

@ray.remote
class DatasetStore:
    def __init__(self):
        self.ds_store = {}

    def store(self, name, dataset):
        self.ds_store[name] = dataset

    def load(self, name):
        return self.ds_store.get(name)


ray.init(address="auto", namespace="myspace")
ds_store = DatasetStore.options(name="my_store", lifetime="detached", get_if_exists=True).remote()

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--save', action='store_true')
    parser.add_argument('--load', action='store_true')
    args = parser.parse_args()

    if args.save:
        ds = ray.data.range(10)
        ray.get(ds_store.store.remote('ds1', ds))

    if args.load:
        ds1 = ray.get(ds_store.load.remote('ds1'))
        print(ds1.take(10))

you can run

python example.py --save

python example.py --load
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]