I was looking into using Ray AIR and Ray Cluster to perform offline batch inference on a large dataset. The model is accelerated using TensorRT or AIT in some cases. To improve on cost, latency and performance I wanted to use these with Ray AIR BatchPredictors.
My understanding is that the checkpoint can only be loaded for predefined ML frameworks. But how should I go about writing a simple wrapper to load weights for models optimized by XLAs.
Or should i write code using raw actors and actorpools?