- Medium: I need to know how to do this.
I am trying to process a batch of data using a Ray cluster. I’m trying to understand how to get a list of workers in my cluster, so I can emulate this example:
# they have a preset number of workers, I need to use the number of workers in my cluster
workers = [Worker.remote(i) for i in range(4)]
ds = ray.data.range(10000)
# -> Dataset(num_blocks=200, num_rows=10000, schema=<class 'int'>)
shards = ds.split(n=4)
# -> [Dataset(num_blocks=13, num_rows=2500, schema=<class 'int'>),
# Dataset(num_blocks=13, num_rows=2500, schema=<class 'int'>), ...]
ray.get([w.train.remote(s) for w, s in zip(workers, shards)])
I was also wondering how handle elasticity. I.e, spot instances would work with this.
E.g, if a spot instance shuts down, I don’t want to drop a particular shard of the dataset.