Deployment Graph vs ServeHandle

What is the benefit of using the new Deployment Graph API vs composing different ServeHandles?


Hi @rabraham great question. In short, they’re equivalent in power and expressiveness. This is more of a stance we take between providing a static graph vs. dynamic graph, where a static definition makes it much easier for us to build new operationalizing APIs for multi-model inference, and provide room needed for further performance optimization.

You can see more of this in our blog post:

The graph structure is hidden in the logic of the entire codebase. This means our users need to read the entire codebase and track down each handle in order to see how the graph is composed, without a static definition of the topology. In order to test the graph, users need to manually write a deployment script that deploys each deployment in the correct topological order

It’s hard to operationalize the deployment graph for production. Given the observation above, it’s also difficult to take some operational actions of the deployment graph without diving into the codebase, such as adjusting parameters like num_replicas , update link to latest model weights, etc.

It can be hard to optimize the graph. If you look carefully at the code snippet above, it called await in a loop as an anti-pattern, since we will wait for each result to return one by one instead of parallelizing them. With a static graph definition, we can avoid these performance bugs and provide advanced support for optimizations such as fusing or co-locating nodes on the same host.

Hi @Jiao_Dong
Thank you very much for that answer. It really helps. I’ll have to slowly unpack that article but it looks great so far. I’ll let you know if I have more questions.

Great Article. Thanks!

Minor question:

  • IR in “Expressing parallel calls is very trivial: just use the same variable. The same variable name (backed by an IR node behind the scene) will be resolved to the same value in separate nodes.”

And a small fyi,
typo in general_model = Genenal_Classification.bind(weights="s3://bucket/file")

Great catch and I just notified our content team to update it, thanks for reading our blog post so carefully :slight_smile:

writing a thesis is of some use after all :wink:

1 Like