How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hi all!
I am trying to understand how to create feature lags on my dataframe which is usually done with groupby + shift (e.g. group by customer id and shift so that you build features about what the customer bought one period ago, 2 periods, etc…)
Any way this can be done in Ray Dataset? I don’t see shift
in the groupby methods, and this does not look like it could be done with map or map_batches? (as shift would need to access the first/last item from a different batch).
Any way to shift data on Ray Dataset? Or am I forced to convert to Dask (and make a copy of the data, if my understanding is correct?)
1 Like
Just discovered that Ray datasets has a map_groups function so now I’m assuming you can achieve this by grouping and mapping the groups to a pandas shift function. Will give it a try and report back.
1 Like
Hi @Andrea_Pisoni - thanks for question. Yes you can use map_groups
to keep first and last item per group, and then do map_batches
on grouped data. Let us know how it works. thanks. We don’t support shift natively now.
Hi Chengsu,
Thanks so much for the suggestion. Can you expand on what you mean?
I thought I would use the pandas.shift function on map_groups directly. How would you use map_batches instead? Each batch is not guaranteed to be a group right? How will that work if a group for example is split across three batches?
Hi @Andrea_Pisoni - actually after thinking again, I think use map_groups
with pandas.shift
should work, I think you need to keep the previous group in memory, so for each group you know how to shift across the groups, right?