Partition by a key

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I am working on a pipeline which requires running a mapper after partitioning by a key. I am wondering if Ray currently provides such functionality

Hi @Praveen, Ray has a way to do it since 1.12.0 by using .groupby(my_key) and then .map_groups(my_mapper_func).

For example:

            >>> df = pd.DataFrame(
            ...     {"A": ["a", "a", "b"], "B": [1, 1, 3], "C": [4, 6, 5]}
            ... )
            >>> ds = ray.data.from_pandas(df) 
            >>> grouped = ds.groupby("A")
            >>> grouped.map_groups(
            ...     lambda g: g.apply(
            ...         lambda c: c / g[c.name].sum() if c.name in ["B", "C"] else c
            ...     )
            ... )