Method chaining on datasets

Hi there, I am having a hard time trying to do a map / method chaining in Ray. Would like some hint on how to move forward.

Scenario: I am reading a parquet file that user data grouped by user. I call map_groups( fn_A) which works fine and returns a Dataset with pandas DataFrame as the format. Then, I groupby the resulted dataset and cal map_groups(fn_b) and it throws an exception. I am not sure why that is?

ds = ck_read_parquet_dir(s3files, columns)
grouped_ds = ds.groupby('id')
result = grouped_ds.map_groups(lambda x: ck1_compute_fs_and_extract(x, True), batch_format="pandas")
    print(f"CCAR schema 1- {result.schema()}")
    print(f"CCAR result type 1 - {result}")
    print(f"CCAR result type 1 - {result.take(3)}")
    print(f"CCAR result 1 - {type(result)}, {result.count()}")
    print(f"CCAR result default_batch_format - {result.default_batch_format()}")

result of above print statements

MapBatches(group_fn): 100%|██████████| 1/1 [00:00<00:00, 9.58it/s]
CCAR schema 1- PandasBlockSchema(names=[‘time’, ‘id’, ])
CCAR result type 1 - Dataset(num_blocks=1, num_rows=16412, schema={time: int64, id: int64)
CCAR result 1 - <class ‘’>, 16412
CCAR result default_batch_format - <class ‘pandas.core.frame.DataFrame’>

However, when I run the following, I get an error
result2 = result.groupby('id').map_groups(lambda x: do_nothing(x))

def do_nothing(df: pd.DataFrame) -> pd.DataFrame:
    print("CCAR: Do NOTHING")
    return df

Error message:
[2m[36m(_sample_block pid=916, ip=[0m return self._table[[k[0] for k in key]].sample(n_samples, ignore_index=True)160[2m[36m(_sample_block pid=916, ip=[0m

TypeError: sample() got an unexpected keyword argument ‘ignore_index’

As far as I understand result is a dataset and one should be able to group by and apply map again. My original code was simply to call like map_groups(fn_a).groupBy(‘id’).map_groups(fn_b)

Any help is highly appreciated.

Thank you

@ckapoor asking the Ray Data group to chime in
cc; @jianxiao @chengsu