How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hi Ray team,
I am new to Ray and I used to use Dask.
In Dask, there is a feature called Publish Datasets. By publishing a dataset, it is possible to read some data, perform some calculation, and share the result with other colleagues through a named reference. Published datasets continue to reside in distributed memory even after all clients requesting them have disconnected.
I would like to know whether there is a similar feature in Ray as well, i.e, is there a way to share objects/datasets between different driver processes?
Thanks!
hi @Arsenal591, welcome to the community!
Sorry, we are a bit busy working on Ray 2.0 release so the reply is a little bit delayed.
For your question, yes, I believe this is totally doable! I think you need a detached actor to do the job.
here is a simple example:
import ray
import argparse
@ray.remote
class DatasetStore:
def __init__(self):
self.ds_store = {}
def store(self, name, dataset):
self.ds_store[name] = dataset
def load(self, name):
return self.ds_store.get(name)
ray.init(address="auto", namespace="myspace")
ds_store = DatasetStore.options(name="my_store", lifetime="detached", get_if_exists=True).remote()
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--save', action='store_true')
parser.add_argument('--load', action='store_true')
args = parser.parse_args()
if args.save:
ds = ray.data.range(10)
ray.get(ds_store.store.remote('ds1', ds))
if args.load:
ds1 = ray.get(ds_store.load.remote('ds1'))
print(ds1.take(10))
you can run
python example.py --save
python example.py --load
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]