Partition by a key

Praveen · July 15, 2022, 5:33pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I am working on a pipeline which requires running a mapper after partitioning by a key. I am wondering if Ray currently provides such functionality

jianxiao · August 1, 2022, 4:59pm

Hi @Praveen, Ray has a way to do it since 1.12.0 by using .groupby(my_key) and then .map_groups(my_mapper_func).

For example:

            >>> df = pd.DataFrame(
            ...     {"A": ["a", "a", "b"], "B": [1, 1, 3], "C": [4, 6, 5]}
            ... )
            >>> ds = ray.data.from_pandas(df) 
            >>> grouped = ds.groupby("A")
            >>> grouped.map_groups(
            ...     lambda g: g.apply(
            ...         lambda c: c / g[c.name].sum() if c.name in ["B", "C"] else c
            ...     )
            ... )

Topic		Replies	Views
Apply function to (groupkey, groupvalue) of grouped by dataset Ray Data	1	538	December 23, 2022
How to do a groupby of a Ray dataset using two keys?	2	455	November 7, 2022
Write by partition Ray Tune	1	420	March 28, 2023
Map_groups chaining bug?	0	286	March 25, 2023
Groupby key with None value Ray Data	0	14	August 1, 2024

Partition by a key

Related topics