When using Ray AIR, does the preprocessor that gets passed into, say,
TensorflowTrainer via the
preprocessor kwarg get “attached” to the final trained model, e.g. the one accessible via
ray.tune.Tuner.fit().get_best_result()? The Ray AIR documentation, e.g. the batch inference section, suggests that it is (given that the test dataset used for prediction is split from the same dataset as the train dataset, but used w/o preprocessing), but does not explicitly call this out.
If it is, at a high level, what are the underlying mechanisms used for doing so, and how do they differ across frameworks, e.g. TF, PT, XGB?
Hey John, thanks a bunch for opening this!
It doesn’t get attached to the model per se but rather to the Checkpoint that is generated. All checkpoints have a get_preprocessor method: Ray AIR API — Ray 2.0.0 which helps you achieve what you’re looking for.
This would be agnostic to all frameworks (since the Checkpoint holds the framework-specific model as a blob)
Thanks Richard. It sounds in that case like it’s the user’s responsibility to replicate the preprocessing logic appropriately at serving time, e.g. using TorchServe custom handlers. Is that a fair assessment?
Yep that’s right. We’ve been in conversation with @Keshi_Dai1/@Keshi_Dai about having this attach to TF models automatically.
We’ve been in conversation with @Keshi_Dai1/@Keshi_Dai about having this attach to TF models automatically.
Doesn’t surprise me I presume that one of the primary blockers to this would be implementing the attaching mechanism in a framework-agnostic way, rather than make something special-case for TF specifically?
Yeah, there will need to be a consideration to design this in a framework agnostic way, but to have hooks for tf.
Anywhere I or other users could go to track the progress of that work/discussion? Or has this mostly been informal so far?
Mostly informal. We have to find funding for that project think will ask you guys to put together some rough requirements on the usage before we write any lines of code.
@rliaw, could you please elaborate more on the point of “the user’s responsibility to replicate the preprocessing logic appropriately at serving time”? Is this specific to “gluing the preprocessor to the trained TF model artifact”? or users still need to do this when they use Ray AIR’s checkpoint.
I would like to understand the routes (with and without Ray Serve ) via Checkpoint in Ray AIR to ensure the feature transformation logic consistency between training and serving. Thank you!