This deck covers basics of KServe including:
* KServe overview
* Explanation of major Kserve components
* Under the hood of Autoscaling: Knative Pod Autocaler (KPA)
rights reserved. Agenda 2 • KServe Overview • KServe Components • Inference Service • Predictor • AutoScaling with Knative Pod Autoscaler (KPA) • ML inference with KServe Examples
rights reserved. KServe Features 4 • Scale to and from Zero • Request based Autoscaling • Batching • Request/Response logging • Traffic management • Security with AuthN/AuthZ • Distributed Tracing • Out-of-the-box metrics
rights reserved. KServe Control Plane 5 • Responsible for reconciling the InferenceService custom resources. • It creates the Knative serverless deployment for predictor, transformer to enable autoscaling based on incoming request workload including scaling down to zero when no traffic is received.
rights reserved. KServe Control Plane 6 • Responsible for reconciling the InferenceService custom resources. • It creates the Knative serverless deployment for predictor, transformer to enable autoscaling based on incoming request workload including scaling down to zero when no traffic is received.
rights reserved. Predictor 8 https://kserve.github.io/website/master/ Queue Proxy measures and limit concurrency to the user’s application Model Server deploys, manages, and serves machine learning models Storage Initializer retrieves and prepares machine learning models from various storage backends like Amazon S3
rights reserved. Transformer 10 https://kserve.github.io/website/master/ Queue Proxy measures and limits concurrency to the user’s application. Model Server preprocesses input data and postprocesses output predictions, enabling seamless integration of custom logic or data transformations with the deployed machine learning models for improved model serving and inference.
rights reserved. KServe Control Plane 11 Inference Service KServe Controller Knative Service Knative Revision Deployment Reconcile Serverless Raw Deployment
rights reserved. KServe Control Plane 12 Inference Service KServe Controller Knative Service Knative Revision Deployment Reconcile Serverless Raw Deployment
rights reserved. Knative Components 13 Inference Service KServe Controller Knative Service Knative Revision Deployment Reconcile Serverless Raw Deployment https://knative.dev/docs/serving/
rights reserved. Knative Serving 14 Inference Service KServe Controller Knative Service Knative Revision Deployment Reconcile Serverless Raw Deployment https://knative.dev/docs/serving/ Knative
rights reserved. Primary Knative Serving Resources 15 Knative Knative Service resource automatically manages the whole lifecycle of your workload. Routes maps a network endpoint to one or more revisions. Configuration maintains the desired state for your deployment. Revision is a point-in-time snapshot of the code and configuration for each modification made to the workload. Deployment
rights reserved. Autoscale Sample 20 https://github.com/dewitt/knative- docs/tree/master/serving/samples/autoscale-go Ramp up traffic to maintain 10 in-flight requests.
rights reserved. Difference between KPA and HPA 22 Knative Pod Autoscaler (KPA) • Part of the Knative Serving core and enabled by default once Knative Serving is installed. • Supports scale to zero functionality. • Does not support CPU-based autoscaling. Horizontal Pod Autoscaler (HPA) • Not part of the Knative Serving core, and must be enabled after Knative Serving installation. • Does not support scale to zero functionality. • Supports CPU-based autoscaling. https://kserve.github.io/website/0.8/modelserving/v1b eta1/torchserve/#autoscaling
rights reserved. We have covered Knative Serving part… 23 Inference Service KServe Controller Knative Service Knative Revision Deployment Reconcile Serverless Raw Deployment https://knative.dev/docs/serving/ Knative
rights reserved. Up next: Inference Service 24 Inference Service KServe Controller Knative Service Knative Revision Deployment Reconcile Serverless Raw Deployment Question: Do we have to deal with the complexity in Knative? Answer: No! All we need is Inference Service.
rights reserved. First Inference Service: load test 27 Under the hood https://kserve.github.io/website/master/get_started/first_isvc/ #5-perform-inference