Adding model compilation like TensorRT in offline inference

1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.

2. Environment:

  • Ray version:
  • Python version:
  • OS:
  • Cloud/Infrastructure:
  • Other libs/tools (if relevant):

3. What happened vs. what you expected:

  • Expected:
  • Actual:

Hi Ray Team,

I have a Ray job for offline inference workflow using Ray Data concepts like map_batches. I wanted to understand if I can add model compilation step before running the inference workflow. Suppose I have model, before triggering the flow, how can I apply model compilation framework like TensorRT, Apache TVM and use the compiled version of the model in the flow. I assume by default, Ray data does not provide any API for doing this.

Thank you

I do think you’re correct in that Ray Data doesn’t natively provide those APIs, but you should still be able to execute the workflow that you just described. I think generally, the workflow would be something similar to this:

  • Generate / Compile your model (any one of the ones you mentioned)
  • Load your compiled model in Ray Data via Ray Dataset
  • Use map_batches to apply the model to your data
  • Run your Ray Data workflow as usual

We have a tutorial here that might be helpful: