What's the best way to compare across models and across observation space?

Mike_Dwell_Siegmund · September 5, 2022, 2:43pm

Hi all,

I have a work project where I have 10000 datasets, each dataset consists of Training, Validating, Testing data.

The data-mining/machine learning problem is a binary classification problem. And of course there are a bunch of models to compare and select, e.g. Logistic Regression, xgboost, SVC, Deep Learning etc.

I have to compare classification performance metrics across all datasets and all models.

So the work-flow is:

for model in models:

  for data in datasets:

         train the model, and collect testing metrics.

  collect metric numbers for all data and calculate their mean.

compare the means across models and make a table of the metrics for all model.

Is there an easy-to-use framework already present for the above model-compare/selection process?

Since these are common models, I would imagine there are already out-of-box solutions for such model comparison.

Could anybody give me some pointers?

Thanks a lot!

matthewdeng · September 6, 2022, 7:04pm

Hey @Mike_Dwell_Siegmund,

This is an interesting use case! Off the top of my head, you may want to use Ray Tune to orchestrate this, and has some out-of-the-box tooling for handling metrics (outputting a table to command line, uploading to TensorBoard, etc.).

For each individual model, depending on how your datasets are stored and organized, Ray Datasets may be able to provide the right abstraction for this and would allow you to process each input dataset individually and then aggregate the results.

Could you share some more details about your workload, e.g. the size and format of the datasets?

Topic		Replies	Views
Shared dataset on a local desktop	1	289	March 7, 2023
Ray.tune - Best practices for reading datasets Ray Tune	1	561	February 18, 2022
Can we share dataset among users Ray Data	1	99	April 9, 2024
Want advice on Improving Ray for Long Machine Learning Model Training	1	69	July 13, 2024
Distributed data loading using Ray Data with XGBoost official (or XGBoost Sklearn) model	1	313	August 26, 2022

What's the best way to compare across models and across observation space?

Related topics