BatchPredictors for TensorRT/AITemplate models


I was looking into using Ray AIR and Ray Cluster to perform offline batch inference on a large dataset. The model is accelerated using TensorRT or AIT in some cases. To improve on cost, latency and performance I wanted to use these with Ray AIR BatchPredictors.

My understanding is that the checkpoint can only be loaded for predefined ML frameworks. But how should I go about writing a simple wrapper to load weights for models optimized by XLAs.

Or should i write code using raw actors and actorpools?


It’s better to ask this question in Ray AIR category.

cc: @kai @matthewdeng