Hi, SLO (e.g., deadlines) is an important requirement for model serving. Is it an exemplary code to implement SLO guarantee for model serving using ray serve? Or, if I want to implement, any suggestions to me, based on my understanding, i can intimate ‘server.batch’ to achieve this function.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Dynamically serve new model via Ray Serve | 0 | 32 | December 7, 2024 | |
[Tune, Serve] Passing a handle to grid search cause trials to get stuck in running and pending mode
|
1 | 335 | October 22, 2022 | |
Automating the serving of many different models | 8 | 1609 | May 3, 2023 | |
Ray serve with dynamic deployments
|
0 | 575 | September 23, 2022 | |
Ray Train and sharded models with Jax--possible? Desirable?
|
2 | 865 | April 22, 2024 |