Patterns and Pains of Migrating Legacy Applications to Kubernetes

Patterns and Pains of Migrating Legacy Applications to Kubernetes Josef
Adersberger & Michael Frank, QAware Robert Bichler, Allianz Germany @adersberger @qaware

Michael Frank, Lead Developer, QAware Robert Bichler, Project Manager, Allianz
Germany Josef Adersberger, Architect, QAware

CIO Let’s bring all our web applications onto a cloud
native Platform

COSTS AVAILABILITY PRODUCTIVITY Digitalization => Agile => Cloud Native Platforms

Priorities: (1) Time (1,5 years) (2) Ops cost savings (3)
Migration costs

6 WE WERE BRAVE

WE FELT PAIN

WE DISCOVERED PATTERNS

9 ❏ All 152 legacy applications migrated and in production
within 17 months ❏ All security-hardened and modernized to containerized 12-factor-apps ❏ Benefits leveraged: strong business case, higher availability, more agile teams WE WERE SUCCESSFUL

The Architect’s Point of View

Patterns for success

12 Visibility

The Cloudalyzer Tableau analysis MIGRATION DATABASE QAVALIDATOR SONARQUBE EAM TOOL
QUESTIONNAIRES JIRA XLS STATIC ANALYSIS IBM MIGRATION TOOL … MIGRATION TASKS BASIC TOUR-DE-MIGRATION SYSTEM PROPERTIES OWASP Scanner jQAssistant

Questionnaire: Typical questions • Technology stack (e.g. OS, appserver, jvm)
• Required resources (memory, CPU cores) • Writes to storage (local/remote storage, write mode, volume) • Special requirements (native libs, special hardware) • Inbound and outbound protocols (protocol stack, TLS, multicast, dynamic ports) • Ability to execute (regression/load tests, business owner, dev knowhow, release cycle, end of life) • Client authentication (e.g. SSO, login, certificates)

15 Emergent design of cloud native software landscapes

Architecting hundreds of applications • Application Blueprint: Describing target architecture
and some rules & principles • Migration Cookbook: Guidance on how to migrate the applications based on the application blueprint. Single source of truth & know-how externalization • Tour-de-Migration: Visiting all applications and collect open issues • GoLive Readiness Checklist: Criteria to be checked before GoLive APPLICATION BLUEPRINT MIGRATION COOKBOOK TOUR-DE-MIGRATION GOLIVE READINESS CHECKLIST Q1/17 Q2/17 Q3/17 Q4/17 Q1/18 Q2/18 APPLICATION MIGRATION CLOUD PLATFORM SETUP

APPLICATION HTTPD WEB LAYER J2EE 1.4 APPSERVER JVM 1.6 DB
MQ HOST BATCH FS CLIENTS TLS 1.0+ TCP-Binary, WS, REST, C:D, LDAP Corba, SMTP, FTP, NAS, … RACF ESB ONPREM DATA CENTER ONPREM DATA CENTER DB MQ HOST BATCH FS RACF ESB KUBERNETES / OPENSHIFT DOCKER JVM 8 INNER APPLICATIONS AWS WEB LAYER AWS CLIENTS TLS 1.2 all TLS 1.2 JEE 7 APPSERVER SECURITY GATEWAY OUTER APPLICATIONS all 2-way TLS 1.2 & OIDC identity token Only data In transit The Blueprint

MONOLITH INNER APPLICATIONS OUTER APPLICATIONS BACKEND CLIENTS SECURITY GATEWAY BACKEND
CLIENTS 1+2 3 1) how to enhance cloud nativeness? 2) how to cut the monolith? 3) how to obtain an identity token? BEFORE AFTER

A sweet spot for legacy apps Cloud Friendly Apps …
and enhance the application according the 12 factors Put the monolith into a container: do not cut, do not enhance with features in parallel

Sidecars to the rescue

Container patterns applied • Log extraction • Task scheduling Sidecar:
Enhance container behaviour Ambassador: Proxy communication Adapter: Provide standardized interface • Configuration (ConfigMaps & Secrets to files) • mTLS tunnel • Circuit Breaking • Request monitoring Pod Application Container Pattern Container Other Container “Design patterns for container-based distributed systems”. Brendan Burns, David Oppenheimer. 2016

Anti-pain rule: Don’t cut the monolith

Anti-pain rule: Don’t cut the monolith MONOLITH SOME MAGIC SAUCE
BACKEND CLIENTS SECURITY GATEWAY BACKEND CLIENTS BEFORE AFTER MONOLITH

Security service to the rescue MONOLITH MONOLITH SECURITY SERVICE BACKEND
CLIENTS SECURITY GATEWAY BACKEND CLIENTS BEFORE AFTER TOKEN PROVIDER IAM SYSTEMS Adapting multiple authentication mechanisms to a uniform OIDC token.

Kubernetes constraints Initially we thought we’ll run into k8s restrictions
on our infrastructure like: ‣ No support for multicast ‣ No RWX PVC available We did. But all required refactorings were moderate effort and lead to a better architecture.

The Lead Developer’s Point of View

The almighty legacy framework • “worry-free package framework” from the
early 2000s with about 500kLOC, 0% test coverage and multiple forks • Strategies: • the hard way: consolidate forks and migrate manually and increase coverage • decorate with ambassadors, sidekicks and adapters • do not migrate parts and replace that API within the applications APPLICATION ALMIGHTY LEGACY FRAMEWORK J2EE 1.4 APPSERVER JVM 1.6 • from J2EE 1.4 to JEE 7 and Java 6 to 8 • add identity token check and relay • modify session handling (synchronization) • modify logging (to STDOUT) • modify configuration (overwrite from ConfigMap) • enforce TLS 1.2 • place circuit breakers • predefined liveness and readiness probes

TIME- OUTS

Timeouts: The pain • Kinds • Timeouts often too high.
This ... – causes bad user experience – hurts the stability of your entire cloud • Unable to distinguish errors from legitimate waits • Diminishes self healing capabilities • Promotes cascading failures Con Pool Server Socket getConnection connect read connection TTL/keepAlive

Timeouts: Recommendations • Keep timeouts within the following ranges –
1-3s for getConnection & connect – 3-60s for socket/read - aim as low as possible – 1-3min for TTL/KeepAlive of pooled connections • Allow for dynamic DNS changes and dynamic scaling of backend services • Tradeoff between reaction time and performance • Cascade timeouts – outer layer highest – inner layer lowest 60s 57s 54s 51s

LATENCY

Latency • Pain: Dramatic increase in latency You can't scale
away latency! – Every layer and new infrastructure component adds processing time – Everything TLS1.2 secured adds processing time – Physical distance: Cloud -> OnPrem • Heaviest impact on n+1 patterns in applications – Adjust batch/fetch size – Parallel fetch – Ultima ratio: on prem (lightweight) service layer close to DB • General – Performance experts in support team – Caching – Use diagnosability tools...

DIAGNO- SABILITY

Diagnosability 1. Early on - diagnose cloud platform issues upfront
2. Holistic - monitor and correlate everything (infrastructure & apps, multiple levels, metrics & logs & traces) 3. Mandatory - everyone has to use it 4. Automatically - auto-instrumentation not involving devs

Metrics Events / Logs Traces • High effort to instrument
for valuable insights • Scalability unclear for hundreds of applications • Applications have no time to run their own Prometheus instance • Scalability unclear for hundreds of applications (Jaeger & ZipKin) • Applications have no time to run their own instance • Scalability unclear (a lot of events lost) • Applications have no time to run their own EFK instance • Non-standardized log format requires custom log rewrite adapter but no fluentd DaemonSet Application Diagnosability?

Metrics Events / Logs Traces … use APM tools like
Dynatrace and Instana Want to move fast? Buy first, reduce cost later Application Diagnosability

SESSION STATE

Session state 1. Session Stickiness: not within the cloud! 2.
Session Persistence • Existing DB: perf impact to high ☹ • Redis: no TLS out of the box and infrastructure required ☹ 3. Session Synchronization • App-Server: no dynamic peer lookup within k8s ☹ • Hazelcast: TLS only in paid enterprise edition ☹ • ...

Session synchronization with Ignite • Apache Ignite as in-memory data
grid – Embedded within application or standalone (in sidecar) – Cumbersome but working k8s peer lookup • Look out for ... – Java serialization – Legacy frameworks with custom session handling – Prevent generating sessions for e.g. health check requests – Applications putting large things into the “session” and misuse session as cache

#@!!#@$

Other technical pain points Pain Pattern Legacy crypto without TLS
1.2 and SNI support (e.g. Java 1.6) • Find matching cipher suites • Add a security proxy Legacy apps violating HTTP standards Refactor Access source URLs in redirect loops (e.g. IDP login) Use x-forwarded header and provide according filter No automated test suites • Automated high-level tests • Test generation (e.g. evosuite)?

The Project Manager’s Point of View

Patterns for success

Management support ❏ Strong management support ❏ Clear scope ❏
Courage to drive the change to cloud native development

Project Marketing & Motivation Identification & Celebration

Co-Location space One LEAP-Area ❏ Support- & ❏ Industrialization team
❏ In case of required support: Migration team

Industrialization

ARCHITECTURE TEAM DOZENS OF MIGRATION PROJECTS RUNNING IN PARALLEL (organized
in release trains) ‣ Training sessions ‣ Support sessions ‣ Co-Location & remote ‣ Guidance / best practice sharing (cookbook, sample application) ‣ Unified development environment (via GitHub) ‣ Standard base images ‣ Pre-migrated frameworks ‣ Solutions: Security service, ambassadors INDUSTRIALIZATION TEAM ‣ Application blueprint ‣ Migration database SUPPORT TEAM ‣ Feedback

Transparency & information radiators App-Support Activities & Milestones Quality GoLive
Planning Operational

Patterns and Pains of Migrating Legacy Applicat...

Patterns and Pains of Migrating Legacy Applications to Kubernetes

More Decks by Josef Adersberger

Other Decks in Technology

Featured

Transcript