Hi Ray team,

I am new to Ray and I used to use Dask.

In Dask, there is a feature called Publish Datasets. By publishing a dataset, it is possible to read some data, perform some calculation, and share the result with other colleagues through a named reference. Published datasets continue to reside in distributed memory even after all clients requesting them have disconnected.

I would like to know whether there is a similar feature in Ray as well, i.e, is there a way to share objects/datasets between different driver processes?


hi @Arsenal591, welcome to the community!

For your question, yes, I believe this is totally doable! I think you need a detached actor to do the job.

here is a simple example:

import ray
import argparse

class DatasetStore:
    def __init__(self):
        self.ds_store = {}

    def store(self, name, dataset):
        self.ds_store[name] = dataset

    def load(self, name):
        return self.ds_store.get(name)

ray.init(address="auto", namespace="myspace")
ds_store = DatasetStore.options(name="my_store", lifetime="detached", get_if_exists=True).remote()

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--save', action='store_true')
    parser.add_argument('--load', action='store_true')
    args = parser.parse_args()

        ds =
        ray.get('ds1', ds))

    if args.load:
        ds1 = ray.get(ds_store.load.remote('ds1'))

you can run

python --save

python --load
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]