Hi, I am a Ray Newbie. I want to create a new dataset column called embedding by passing each row to the GetEmbeddings class. I was wondering if this is possible? I want to do the embedding processing step in parallel. Currently I get error that
Standalone Python objects are not allowed in Ray 2.5. To return Python objects from map(), wrap them in a dict, e.g., return "{'item': item}" instead of just "item"
Assume ds is a Ray Dataset
@ray.remote
class GetEmbeddings:
def __init__(self):
# Do something
def get_response(self, row):
# Get some_embedding_list by running a model on row["text"]
row["embedding"] = some_embedding_list
return row
process_embedding = GetEmbeddings.remote()
ds = ds.map(process_embedding.get_response.remote)
Thanks!