What is the recommended approach for a lists of pandas/numpy datasets already loaded in the ray plasma store to be used for xgboost_ray?
This currently isn’t supported; we should file a feature request for this. How big are your datasets?
They datasets are over 100GB
@kai any recommendation here?
BTW, also posted a FR here
Hi @Javier_Bosch, sorry, this thread totally slipped.
As a matter of fact we have the functionality for this already implemented. You could just pass RayDMatrix(object_ref)
to make it work.
However, currently this worked only for single references. This PR: Support object store references as data source by krfricke · Pull Request #139 · ray-project/xgboost_ray · GitHub introduces a very simple change that enables passing lists of object references e.g. RayDMatrix([obj1, obj2, obj3, ob4], "label")
.
Let me know if this works for you!
1 Like