How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
I am working on a pipeline which requires running a mapper after partitioning by a key. I am wondering if Ray currently provides such functionality
Hi @Praveen, Ray has a way to do it since 1.12.0 by using .groupby(my_key)
and then .map_groups(my_mapper_func)
.
For example:
>>> df = pd.DataFrame(
... {"A": ["a", "a", "b"], "B": [1, 1, 3], "C": [4, 6, 5]}
... )
>>> ds = ray.data.from_pandas(df)
>>> grouped = ds.groupby("A")
>>> grouped.map_groups(
... lambda g: g.apply(
... lambda c: c / g[c.name].sum() if c.name in ["B", "C"] else c
... )
... )