daily mobility Our technology empowers mobility providers to orchestrate existing and new modes of public transport. Together we create an effortless transport experience to make mobility service attractive to millions of users. More mobility. Less traffic. 2
and tomorrow About Mobimeo Founded in Founded in 2018 as subsidiary company of Deutsche Bahn AG and merged with parts of moovel Group GmBH in 2020 Offices in Berlin and Hamburg 170 Mobimeos from over 39 nations
a relation between architecture and team setup • “Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure.” Conway’s Law • Enables teams to make autonomous decisions Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz Remove placeholder
perspective ◦ Behaviour: does not change unexpectedly ◦ Availability: when can we retire an API? • How to express such a contract? ◦ Machine readable: Swagger/OpenAPI, JSON Schema, GraphQL ◦ API Versions • Abstain from breaking changes ◦ Additional properties? ◦ Extending enums? • Make everything optional: Protobuf3 Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz Service Boundaries are Defined by Contracts
Documents can be formally correct • But semantics have changed ◦ References in a document ◦ Content: New ID for entity • Pragmatic solution: Contract tests Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz
traffic? ◦ Correlation Ids ◦ Callers need to tag their requests • Manage access ◦ Service Accounts ◦ Declarative: Service Mesh Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz The Other Side: Protection from Harmful Workloads
serial cable ◦ DRBD/GFS ◦ STONITH Hardware • Complex HA machinery was often the cause of outages Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz Consensus Systems are Great 🖥💥 🔫
• Consensus Protocols • L. Lamport: The Part-Time Parliament, 1998 • Simple example: Raft (consul, etcd) Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz Safe Coordination in Distributed systems
Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz However: Murphy’s Law We take a lot of things for granted + there are unknown unknowns.
◦ Node cannot pull redis:latest 🙀 • DNS Load Balancing • DNS transport is UDP • UDP Packages are limited in size • Per Spec DNS allows <= 512 bytes Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz Scenario 1: DockerHub
when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz • DNS responses > 512 bytes fall back to TCP ◦ Your sysadmin might not know this ◦ Security Group blocks tcp/53 • Not all resolvers are alike / agree on the spec ◦ Glibc “salvages” truncated DNS messages ◦ Golang DNS resolver (Docker) does not ◦ Quick fix: CGO_ENABLED=1
most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz • Our J2EE service is stuck in an exception loop ◦ Logs a lot of large stack traces (lots of lines) • Engineers integrate cool .io SaaS for tailing logs in Logstash ◦ Every line a request to cool .io data sink ◦ Every line a hostname is resolved • Cloud Providers disapproves, starts rate-limiting DNS the service’s node • K8S api-server/node comm. is affected. ◦ Node is marked as broken ◦ Scheduler moved ever-crashing service to fresh, healthy node • Repeat
pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz • Nov. 25th Kinesis outage ◦ every node connects with every other node ◦ After scaling exceeded threads-max • File Handles ◦ Some workloads do not properly close TCP/IP connections ◦ Intermediate proxies have to arbitrarily terminate ◦ (Old) user-land kube-proxy leaked goroutines & file handles
about the services they serve is used to define - service level indicators (SLIs), - objectives (SLOs), - and agreements (SLAs). Service Level Objectives Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz SRE Book - Service Level Objectives
rate - error rate - proportion of service errors - traffic / system throughput - typically measured in requests per second - availability - what’s the uptime of a service - saturation - measures the system fraction, emphasizing the resources that are most constrained (e.g., in a memory-constrained system, show memory; in an I/O-constrained system, show I/O). I experienced system degrading service levels before being saturated, e.g. 90% CPU utilization triggered a service degradation already. Guidance - The Four Golden Signals Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz SRE Book - The Four Golden Signals