objectives “*” It’s actively being worked. See Utility is in the Eye of the User: A Critique of NLP Leaderboards (Ethayarajh and Jurafsky, EMNLP 2020) ML in research vs. in production
costs while adapting to the dynamic workload. Using a set of model variants simultaneously provides higher average accuracy compared to having one variant. Inference Serving Systems should consider accuracy, latency, and cost at the same time.
costs while adapting to the dynamic workload. Using a set of model variants simultaneously provides higher average accuracy compared to having one variant. Inference Serving Systems should consider accuracy, latency, and cost at the same time. InfAdapter!
Switching! Previous works INFaaS and Model-Switch have proven that there is a big a latency-accuracy- resource footprint tradeoffs of models trained for the same task
Switching! Previous works INFaaS and Model-Switch have proven that there is a big a latency-accuracy- resource footprint tradeoffs of models trained for the same task
Switching! Previous works INFaaS and Model-Switch have proven that there is a big a latency-accuracy- resource footprint tradeoffs of models trained for the same task
set of autoscaling, scheduling, observability tools (e.g. CPU usage) 4. APIs for changing the current AutoScaling algorithms 1. Industry standard ML server 2. Have the ability make inference graph 3. Rest and GRPC endpoints 4. Have many of the features we need like monitoring stack out of the box How to navigate Model Variants