What's the best way to compare across models and across observation space?

Hi all,

I have a work project where I have 10000 datasets, each dataset consists of Training, Validating, Testing data.

The data-mining/machine learning problem is a binary classification problem. And of course there are a bunch of models to compare and select, e.g. Logistic Regression, xgboost, SVC, Deep Learning etc.

I have to compare classification performance metrics across all datasets and all models.

So the work-flow is:


for model in models:

  for data in datasets:

         train the model, and collect testing metrics.

  collect metric numbers for all data and calculate their mean.

compare the means across models and make a table of the metrics for all model.


Is there an easy-to-use framework already present for the above model-compare/selection process?

Since these are common models, I would imagine there are already out-of-box solutions for such model comparison.

Could anybody give me some pointers?

Thanks a lot!

Hey @Mike_Dwell_Siegmund,

This is an interesting use case! Off the top of my head, you may want to use Ray Tune to orchestrate this, and has some out-of-the-box tooling for handling metrics (outputting a table to command line, uploading to TensorBoard, etc.).

For each individual model, depending on how your datasets are stored and organized, Ray Datasets may be able to provide the right abstraction for this and would allow you to process each input dataset individually and then aggregate the results.

Could you share some more details about your workload, e.g. the size and format of the datasets?