Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Patterns and Pains of Migrating Legacy Applicat...

Patterns and Pains of Migrating Legacy Applications to Kubernetes

Running applications on Kubernetes can provide a lot of benefits: more dev speed, lower ops costs, and a higher elasticity & resiliency in production. Kubernetes is the place to be for cloud native apps. But what to do if you’ve no shiny new cloud native apps but a whole bunch of JEE legacy systems? No chance to leverage the advantages of Kubernetes? Yes you can!

We’re facing the challenge of migrating hundreds of JEE legacy applications of a German blue chip company onto a Kubernetes cluster within one year.

The talk will be about the lessons we've learned - the best practices and pitfalls we've discovered along our way.

Josef Adersberger

August 31, 2018
Tweet

More Decks by Josef Adersberger

Other Decks in Technology

Transcript

  1. Patterns and Pains of Migrating Legacy Applications to Kubernetes Josef

    Adersberger & Michael Frank, QAware Robert Bichler, Allianz Germany @adersberger @qaware
  2. 9 ❏ All 152 legacy applications migrated and in production

    within 17 months ❏ All security-hardened and modernized to containerized 12-factor-apps ❏ Benefits leveraged: strong business case, higher availability, more agile teams WE WERE SUCCESSFUL
  3. The Cloudalyzer Tableau analysis MIGRATION DATABASE QAVALIDATOR SONARQUBE EAM TOOL

    QUESTIONNAIRES JIRA XLS STATIC ANALYSIS IBM MIGRATION TOOL … MIGRATION TASKS BASIC TOUR-DE-MIGRATION SYSTEM PROPERTIES OWASP Scanner jQAssistant
  4. Questionnaire: Typical questions • Technology stack (e.g. OS, appserver, jvm)

    • Required resources (memory, CPU cores) • Writes to storage (local/remote storage, write mode, volume) • Special requirements (native libs, special hardware) • Inbound and outbound protocols (protocol stack, TLS, multicast, dynamic ports) • Ability to execute (regression/load tests, business owner, dev knowhow, release cycle, end of life) • Client authentication (e.g. SSO, login, certificates)
  5. Architecting hundreds of applications • Application Blueprint: Describing target architecture

    and some rules & principles • Migration Cookbook: Guidance on how to migrate the applications based on the application blueprint. Single source of truth & know-how externalization • Tour-de-Migration: Visiting all applications and collect open issues • GoLive Readiness Checklist: Criteria to be checked before GoLive APPLICATION BLUEPRINT MIGRATION COOKBOOK TOUR-DE-MIGRATION GOLIVE READINESS CHECKLIST Q1/17 Q2/17 Q3/17 Q4/17 Q1/18 Q2/18 APPLICATION MIGRATION CLOUD PLATFORM SETUP
  6. APPLICATION HTTPD WEB LAYER J2EE 1.4 APPSERVER JVM 1.6 DB

    MQ HOST BATCH FS CLIENTS TLS 1.0+ TCP-Binary, WS, REST, C:D, LDAP Corba, SMTP, FTP, NAS, … RACF ESB ONPREM DATA CENTER ONPREM DATA CENTER DB MQ HOST BATCH FS RACF ESB KUBERNETES / OPENSHIFT DOCKER JVM 8 INNER APPLICATIONS AWS WEB LAYER AWS CLIENTS TLS 1.2 all TLS 1.2 JEE 7 APPSERVER SECURITY GATEWAY OUTER APPLICATIONS all 2-way TLS 1.2 & OIDC identity token Only data In transit The Blueprint
  7. MONOLITH INNER APPLICATIONS OUTER APPLICATIONS BACKEND CLIENTS SECURITY GATEWAY BACKEND

    CLIENTS 1+2 3 1) how to enhance cloud nativeness? 2) how to cut the monolith? 3) how to obtain an identity token? BEFORE AFTER
  8. MONOLITH INNER APPLICATIONS OUTER APPLICATIONS BACKEND CLIENTS SECURITY GATEWAY BACKEND

    CLIENTS 1+2 3 1) how to enhance cloud nativeness? 2) how to cut the monolith? 3) how to obtain an identity token? BEFORE AFTER
  9. A sweet spot for legacy apps Cloud Friendly Apps …

    and enhance the application according the 12 factors Put the monolith into a container: do not cut, do not enhance with features in parallel
  10. Container patterns applied • Log extraction • Task scheduling Sidecar:

    Enhance container behaviour Ambassador: Proxy communication Adapter: Provide standardized interface • Configuration (ConfigMaps & Secrets to files) • mTLS tunnel • Circuit Breaking • Request monitoring Pod Application Container Pattern Container Other Container “Design patterns for container-based distributed systems”. Brendan Burns, David Oppenheimer. 2016
  11. MONOLITH INNER APPLICATIONS OUTER APPLICATIONS BACKEND CLIENTS SECURITY GATEWAY BACKEND

    CLIENTS 1+2 3 1) how to enhance cloud nativeness? 2) how to cut the monolith? 3) how to obtain an identity token? BEFORE AFTER
  12. Anti-pain rule: Don’t cut the monolith MONOLITH SOME MAGIC SAUCE

    BACKEND CLIENTS SECURITY GATEWAY BACKEND CLIENTS BEFORE AFTER MONOLITH
  13. MONOLITH INNER APPLICATIONS OUTER APPLICATIONS BACKEND CLIENTS SECURITY GATEWAY BACKEND

    CLIENTS 1+2 3 1) how to enhance cloud nativeness? 2) how to cut the monolith? 3) how to obtain an identity token? BEFORE AFTER
  14. Security service to the rescue MONOLITH MONOLITH SECURITY SERVICE BACKEND

    CLIENTS SECURITY GATEWAY BACKEND CLIENTS BEFORE AFTER TOKEN PROVIDER IAM SYSTEMS Adapting multiple authentication mechanisms to a uniform OIDC token.
  15. Kubernetes constraints Initially we thought we’ll run into k8s restrictions

    on our infrastructure like: ‣ No support for multicast ‣ No RWX PVC available We did. But all required refactorings were moderate effort and lead to a better architecture.
  16. The almighty legacy framework • “worry-free package framework” from the

    early 2000s with about 500kLOC, 0% test coverage and multiple forks • Strategies: • the hard way: consolidate forks and migrate manually and increase coverage • decorate with ambassadors, sidekicks and adapters • do not migrate parts and replace that API within the applications APPLICATION ALMIGHTY LEGACY FRAMEWORK J2EE 1.4 APPSERVER JVM 1.6 • from J2EE 1.4 to JEE 7 and Java 6 to 8 • add identity token check and relay • modify session handling (synchronization) • modify logging (to STDOUT) • modify configuration (overwrite from ConfigMap) • enforce TLS 1.2 • place circuit breakers • predefined liveness and readiness probes
  17. Timeouts: The pain • Kinds • Timeouts often too high.

    This ... – causes bad user experience – hurts the stability of your entire cloud • Unable to distinguish errors from legitimate waits • Diminishes self healing capabilities • Promotes cascading failures Con Pool Server Socket getConnection connect read connection TTL/keepAlive
  18. Timeouts: The pain • Kinds • Timeouts often too high.

    This ... – causes bad user experience – hurts the stability of your entire cloud • Unable to distinguish errors from legitimate waits • Diminishes self healing capabilities • Promotes cascading failures Con Pool Server Socket getConnection connect read connection TTL/keepAlive
  19. Timeouts: Recommendations • Keep timeouts within the following ranges –

    1-3s for getConnection & connect – 3-60s for socket/read - aim as low as possible – 1-3min for TTL/KeepAlive of pooled connections • Allow for dynamic DNS changes and dynamic scaling of backend services • Tradeoff between reaction time and performance • Cascade timeouts – outer layer highest – inner layer lowest 60s 57s 54s 51s
  20. Latency • Pain: Dramatic increase in latency You can't scale

    away latency! – Every layer and new infrastructure component adds processing time – Everything TLS1.2 secured adds processing time – Physical distance: Cloud -> OnPrem • Heaviest impact on n+1 patterns in applications – Adjust batch/fetch size – Parallel fetch – Ultima ratio: on prem (lightweight) service layer close to DB • General – Performance experts in support team – Caching – Use diagnosability tools...
  21. Latency • Pain: Dramatic increase in latency You can't scale

    away latency! – Every layer and new infrastructure component adds processing time – Everything TLS1.2 secured adds processing time – Physical distance: Cloud -> OnPrem • Heaviest impact on n+1 patterns in applications – Adjust batch/fetch size – Parallel fetch – Ultima ratio: on prem (lightweight) service layer close to DB • General – Performance experts in support team – Caching – Use diagnosability tools...
  22. Diagnosability 1. Early on - diagnose cloud platform issues upfront

    2. Holistic - monitor and correlate everything (infrastructure & apps, multiple levels, metrics & logs & traces) 3. Mandatory - everyone has to use it 4. Automatically - auto-instrumentation not involving devs
  23. Metrics Events / Logs Traces • High effort to instrument

    for valuable insights • Scalability unclear for hundreds of applications • Applications have no time to run their own Prometheus instance • Scalability unclear for hundreds of applications (Jaeger & ZipKin) • Applications have no time to run their own instance • Scalability unclear (a lot of events lost) • Applications have no time to run their own EFK instance • Non-standardized log format requires custom log rewrite adapter but no fluentd DaemonSet Application Diagnosability?
  24. Metrics Events / Logs Traces … use APM tools like

    Dynatrace and Instana Want to move fast? Buy first, reduce cost later Application Diagnosability
  25. Session state 1. Session Stickiness: not within the cloud! 2.

    Session Persistence • Existing DB: perf impact to high ☹ • Redis: no TLS out of the box and infrastructure required ☹ 3. Session Synchronization • App-Server: no dynamic peer lookup within k8s ☹ • Hazelcast: TLS only in paid enterprise edition ☹ • ...
  26. Session synchronization with Ignite • Apache Ignite as in-memory data

    grid – Embedded within application or standalone (in sidecar) – Cumbersome but working k8s peer lookup • Look out for ... – Java serialization – Legacy frameworks with custom session handling – Prevent generating sessions for e.g. health check requests – Applications putting large things into the “session” and misuse session as cache
  27. Other technical pain points Pain Pattern Legacy crypto without TLS

    1.2 and SNI support (e.g. Java 1.6) • Find matching cipher suites • Add a security proxy Legacy apps violating HTTP standards Refactor Access source URLs in redirect loops (e.g. IDP login) Use x-forwarded header and provide according filter No automated test suites • Automated high-level tests • Test generation (e.g. evosuite)?
  28. Management support ❏ Strong management support ❏ Clear scope ❏

    Courage to drive the change to cloud native development
  29. Co-Location space One LEAP-Area ❏ Support- & ❏ Industrialization team

    ❏ In case of required support: Migration team
  30. ARCHITECTURE TEAM DOZENS OF MIGRATION PROJECTS RUNNING IN PARALLEL (organized

    in release trains) ‣ Training sessions ‣ Support sessions ‣ Co-Location & remote ‣ Guidance / best practice sharing (cookbook, sample application) ‣ Unified development environment (via GitHub) ‣ Standard base images ‣ Pre-migrated frameworks ‣ Solutions: Security service, ambassadors INDUSTRIALIZATION TEAM ‣ Application blueprint ‣ Migration database SUPPORT TEAM ‣ Feedback