• Repeatable Builds and Workflows • Application Portability • High Degree of Control over Software • Faster Development Cycle • Reduced dev-ops load • Improved Infrastructure Utilization
8080 8080 Volume Node • A pod is a set of co-located containers • Created by a declarative specification supplied to the master • Each pod has its own IP address • Volumes can be local or network-attached
and stateful workloads – Streamlined developer experience – Reduced operational costs – Improved infrastructure utilization • Kubernetes and the Container Ecosystem – Lots of addon services: third-party logging, monitoring, and security tools – For example, the Istio project, announced May 24, by IBM, Google and Lyft, provides a service mesh for authenticating, authorizing, tracing, and timing, and rate-limiting container-to-container communication, and more.
server: component to stage local files • Spark Shuffle service: component to store shuffle data for dynamic allocation • ThirdParty/CustomResources: extend Kubernetes API with Spark Knowledge Shuffle Service SparkJob API Endpoint
Admission Control RBAC • Launch Spark Jobs as a particular user into a specific namespace • RBAC and Namespace-level resource quotas • Audit logging for clusters • Several monitoring solutions to see node, cluster and pod-level statistics
the driver for job Deep Dive • Spark Submit submits job to K8s • K8s schedules the driver for job • Driver requests executors as needed kubernetes cluster apiserver scheduler spark driver create executor pods
the driver for job Deep Dive • Spark Submit submits job to K8s • K8s schedules the driver for job • Driver requests executors as needed • Executors scheduled and created kubernetes cluster apiserver scheduler spark driver schedule executor pods executors
the driver for job Deep Dive • Spark Submit submits job to K8s • K8s schedules the driver for job • Driver requests executors as needed • Executors scheduled and created • Executors run tasks kubernetes cluster apiserver scheduler spark driver executors
the driver for job Deep Dive • Spark Submit submits job to K8s • K8s schedules the driver for job • Driver requests executors as needed • Executors scheduled and created • Executors run tasks • Driver “completes” job and persists logs kubernetes cluster apiserver scheduler spark driver
Cluster Mode Java/Scala Support Dynamic Allocation Local File Staging High Availability Spark SQL GraphX MLlib Dec 2016 Development Began Mar 2017 Alpha Release June 2017 Beta Release Nov 2016 Design = supported but untested = not yet supported