Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Bitso Empowers Its Devs to Troubleshoot K8s...

How Bitso Empowers Its Devs to Troubleshoot K8s Independently

More often than not, Kubernetes is behaving as it should: fast, agile and scalable. But what happens when K8s misbehaves? Do the responders have the right tools to deal with the situation? Or are they just part of an escalation chain that always leads to the same small group of experts?

Meet Bitso, a billion-dollar company and the largest crypto platform in Latin America, which has mastered the shift-left approach and empowered its dev team to troubleshoot K8s independently.

Join this session to hear Bitso’s engineering lead, Juan Jose Mejia, and Komodor’s head of solution architects, Oren Ninio, as they share Bitso’s journey and how the company was able to:

Create a lean (and mean) K8s troubleshooting workflow that reduced MTTR by 75%
Relieve DevOps bottlenecks and save more than 30 DevOps hours every week
Use a mix of developer-friendly tools and training strategies to bridge the K8s knowledge gap across the organization within just a few months

Komodor

May 01, 2022
Tweet

More Decks by Komodor

Other Decks in Programming

Transcript

  1. WHO ARE WE? Oren Nini o Head of Solution Architectur

    e @Komodor Juan Jose Meji a Engineer Lea d @Bitso
  2. Bitso is a fully remote organization with Bitsonauts in over

    38 countries around the world. Bitso’s mission is to make crypto useful by providing millions of Latin Americans with an alternative way to access fast, and reliable financial services powered by crypto. This will address Latin America’s financial inclusion issues for the 70% of unbanked and underrepresented communities in the region . With over 4 million users, Bitso is the leading cryptocurrency platform in Latin America. ABOUT BITSO
  3. TRANSITIONING FROM A STARTUP TO MEDIUM- SIZED COMPANY How did

    our existing architecture look like before ? • Monolith services each living in a K8S pod, to Microservices living in their own pods . • Relational unclustered databases, to clustered databases . • Synchronous messaging architecture, to asynchronous event driven architecture .
  4. CHALLENGE #1 The migration to K8s meant we needed to

    re- evaluate our legacy monolithic architecture . • Services running on VMs from a custom provider . • Logs only accessed via kubectl . • Jenkins CI added a manual extra step to run a codebuild . • Standalone VPNs required a lot of maintenance.
  5. CHALLENGE #2 Our dev processes were no longer relevant .

    The migration to K8s also meant we needed to re-evaluate our current dev processes, for example: • Manual merges to dev/stage . • Creation of new services on Kubernetes . • Pull Request interaction . • Logging in to AWS.
  6. CHALLENGE #3 We noticed a lack of K8s expertise &

    knowledge . K8s is an emerging technology with few engineers having deep understanding of its inner workings . This knowledge gap impacted the speed of our development processes and our troubleshooting capabilities. Source: The State of Kubernetes 2021 (VMware)
  7. Implemented K8s-friendly tools. SOLUTION #1 Replaced our standalone VPNs with

    AWS VPNs in order to improve our uptime and reliability. Replaced Jenkins CI with CircleCI in order to have automated pipelines without human intervention.
  8. Implemented K8s-friendly tools. SOLUTION #1 Implemented Komodor to help our

    devs troubleshoot production clusters and lower environment issues. With Komodor, we had access to pods health statuses, deployment and health change events, and loggings per pod instance, all in one central place. Instead of accessing logs via the kubectl console, we implemented Splunk to view our logs in a proper and more detailed way on the browser.
  9. Adapted our processes to our cloud native architecture. SOLUTION #2

    • Implemented a bot that automatically merges and undo’s branches . • Set up sync meetings to anticipate and plan new services creations . • Changed process where teams needed to agree on a solution together. before coding so the PR result is closer to the expected for the team . • Used solutions like saml2aws that simplified the aws cli interaction .
  10. Continuously bridging the K8s knowledge gap. SOLUTION #3 We established

    a dedicated, weekly session for knowledge sharing for our engineering teams. These have multiple objectives, including : • Sharing specific use cases with K8s . • Discussing best practices the devs would like to share . • Displaying specific issues that were troubleshooted and lessons learnt from these scenarios.
  11. Q&A