Loading data for XGBoost_ray

What is the recommended approach for a lists of pandas/numpy datasets already loaded in the ray plasma store to be used for xgboost_ray?

This currently isn’t supported; we should file a feature request for this. How big are your datasets?

They datasets are over 100GB

@kai any recommendation here?

BTW, also posted a FR here

Hi @Javier_Bosch, sorry, this thread totally slipped.

As a matter of fact we have the functionality for this already implemented. You could just pass RayDMatrix(object_ref) to make it work.

However, currently this worked only for single references. This PR: Support object store references as data source by krfricke · Pull Request #139 · ray-project/xgboost_ray · GitHub introduces a very simple change that enables passing lists of object references e.g. RayDMatrix([obj1, obj2, obj3, ob4], "label").

Let me know if this works for you!

1 Like