In pandas there is a sample API, how can I do that in Ray Datasets?
sample
My traceback:
... File "starter.py", line 35, in train_xgboost X_train = data.sample(frac=1 - test_fraction) AttributeError: 'Dataset' object has no attribute 'sample'
Thank you!
I filed this GitHub issue to add random_sample(): [data] Add dataset.random_sample() API · Issue #24449 · ray-project/ray · GitHub
For now, you can ds.map_batches(lambda batch: batch.sample(...)).take() to implement random sampling “manually”.
ds.map_batches(lambda batch: batch.sample(...)).take()