How severe does this issue affect your experience of using Ray?
- None: Asking for design help.
I have the following Actor that holds a dataframe. I don’t want the actor to hold the dataframe in memory, so I put it in the object store and then get it when needed.
It feels like an anti-pattern because I explicitly put the df in the object store in load_dataset(). Then I manually de-reference it when needed in shape().
What is the best way to design an actor that hold data and functions that reference the data?
@ray.remote
class DataSet:
"""This remote class wraps a Sklearn dataset."""
dataset_dict = {
'iris': load_iris,
'wine': load_wine,
'digits': load_digits
}
def __init__(self, dataset_choice):
self.dataset_choice = dataset_choice
self.sklearn_data_ref, self.dataset_ref = self.load_dataset(dataset_choice)
def load_dataset(self, dataset_choice):
load_dataset = self.dataset_dict[dataset_choice]
sklearn_data = load_dataset()
dataset_df = pd.DataFrame(data=sklearn_data.data, columns=sklearn_data.feature_names)
sk_ref = ray.put(sklearn_data)
dataset_ref = ray.put(dataset_df)
return sk_ref, dataset_ref
def shape(self):
dataset = ray.get(self.dataset_ref)
return dataset.shape