Running a production ready service mesh is VERY hard. From fully understanding the mesh architecture and installing it,to how to scale, upgrade, monitor, secure, and so on, it can take many months to build confidence to release it into production.
For more than 2 years, our Traffic team has been running an Istio based service mesh. We faced tons of problems during the many phases of a project this big. From picking initial features to choose the Mesh architecture, to thinking on how to hide Mesh complexity details from users to when released it to production.
In this talk we want to share the biggest stumbles in a variety of subjects related to installation, maintenance, monitoring, upgrading and operation of Istio. We believe that by sharing some, hard to find, tips and tricks, people and organizations can save a lot of time adopting Istio Service Mesh.
Currently The Mesh is our main traffic solution, we chose to run a Single Mesh that spreads to all our Business Units Kubernetes clusters, creating a virtual, network flat like environment. Istio is responsible to do mutual TLS, traffic routing, canary releases, retry policy, outlier detection, circuit breaking, authentication and authorization between micro-services and users, and it is a core piece of software to our currently multi-region initiative. We also extended it to other custom features.
Join us in this minefield Service Mesh adventure and learn how to avoid almost all of them.