I have an actor that multiplies every number in a list by a constant factor m.
class MultiplyActor:
def __init__(self, m):
self.m = m
def __call__(self, x):
return [y * self.m for y in x]
I want to parameterize the actor by passing the value of m to its constructor.
I also want to use this actor in a Dataset.map_batches
operation with the actor pool compute strategy. However, Dataset.map_batches
requires that you pass it a class that takes no constructor arguments.
(This is a simplified example. In a real application the initialization of my actor would be an expensive step whose result I’d want to cache.)
I made this work by passing Dataset.map_batches
a closure like so.
import ray
from ray.data import ActorPoolStrategy
class MultiplyActor:
def __init__(self, m):
self.m = m
def __call__(self, x):
return [y * self.m for y in x]
def main(m: int):
def closure(x):
return MultiplyActor(m)(x)
data_set = ray.data.range(5)
data_set = data_set.map_batches(closure, compute=ActorPoolStrategy())
data_set.show()
if __name__ == "__main__":
main(3)
Is the recommended way to parameterize actor pool actors in Ray?