Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Steps toward self-service operations in eureka
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
fukubaka0825
May 14, 2022
Technology
8.2k
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Steps toward self-service operations in eureka
SRE NEXT 2022 5/14
https://sre-next.dev/2022/schedule#jp51
fukubaka0825
May 14, 2022
More Decks by fukubaka0825
See All by fukubaka0825
ペアーズにおける評価ドリブンな AI Agent 開発のご紹介
fukubaka0825
15
4.4k
ペアーズでの、Langfuseを中心とした評価ドリブンなリリースサイクルのご紹介
fukubaka0825
5
1.5k
ペアーズにおけるAmazon Bedrockを⽤いた障害対応⽀援 ⽣成AIツールの導⼊事例 @ 20241115配信AWSウェビナー登壇
fukubaka0825
7
3.4k
SRE NEXT 2022に学ぶこれからのSREキャリア
fukubaka0825
2
900
SREの探求のすゝめ
fukubaka0825
5
8k
Three principles to design your slackbot to be loved in your team
fukubaka0825
0
4.5k
Goでinteractive message slack botを作ってみた
fukubaka0825
0
330
Other Decks in Technology
See All in Technology
徹底討論!ECS vs EKS!
daitak
3
1.7k
技術・能力を向上する原理原則 #きのこセッションa #きのこ2026
bash0c7
0
120
iOS アプリの「これって不具合ですか?」を AI に調べてもらう
miichan
0
140
Microsoft のサポートとフィードバック総まとめ
murachiakira
PRO
0
110
FPC(フレキシブル)基板にZephyr実装してみた。
iotengineer22
0
170
Agile and AI Redmine Japan 2026
hiranabe
4
480
水を運ぶ人としてのリーダーシップ
izumii19
4
990
スタートアップにAmazon EKSは早すぎる? マルチプロダクト戦略を加速する Platform Engineeringの実践 / Is Amazon EKS Too Soon for Startups? Practical Platform Engineering to Accelerate a Multi-Product Strategy
elmodev09
1
1.8k
WebGIS AI Agentの紹介
_shimizu
0
560
GitHub Copilot app最速の発信の裏側
tomokusaba
1
260
2026-06-24_人とAIの責務分離に基づく開発プロセスの提案.pdf
takahiromatsui
0
120
AI 不只幫你寫 Code: 當專案從 300 暴增到 1500, 我們如何撐住 DevOps
appleboy
0
220
Featured
See All Featured
New Earth Scene 8
popppiees
3
2.4k
Leading Effective Engineering Teams in the AI Era
addyosmani
9
2.1k
The Invisible Side of Design
smashingmag
301
52k
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
210
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
1
2.7k
Into the Great Unknown - MozCon
thekraken
41
2.6k
Utilizing Notion as your number one productivity tool
mfonobong
4
330
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
170
The #1 spot is gone: here's how to win anyway
tamaranovitovic
2
1.1k
Code Review Best Practice
trishagee
74
20k
Have SEOs Ruined the Internet? - User Awareness of SEO in 2025
akashhashmi
0
370
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.9k
Transcript
1 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting Steps toward self-service operations in eureka SRE NEXT 2022 2022/05/14 © 2021 eureka, Inc. All Rights Reserved.
2 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting Who is me © 2021 eureka, Inc. All Rights Reserved. wapper/nari • Site Reliability Engineer at eureka, inc. • Favorite: VR/Hip Hop/Skate Board/Sauna • Twitter ◦ Real: @fukubaka0825 ◦ VR: @wapper0825
3 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting © 2021 eureka, Inc. All Rights Reserved. Eureka’s current situation
4 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting © 2021 eureka, Inc. All Rights Reserved. Products: 2 Regions: 3 Developers: 50+
5 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting Old(〜2020) Our SRE Team Practice Overview © 2021 eureka, Inc. All Rights Reserved.
6 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting New Our SRE Team Practice Overview © 2021 eureka, Inc. All Rights Reserved.
7 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting Today’s topic scope © 2021 eureka, Inc. All Rights Reserved. “Self-Serive” Operation Design
8 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting • Good “Self-Service” Operations are ◦ Low Cognitive Load ◦ Low Operational Load for “Users” ◦ Secure and Auditable Conclusion © 2021 eureka, Inc. All Rights Reserved.
9 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting © 2021 eureka, Inc. All Rights Reserved. What/Why/How “Self-Service” Operations
10 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting What is “Self Service” Operations? © 2021 eureka, Inc. All Rights Reserved.
11 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting Why “Self Service” Operations? © 2021 eureka, Inc. All Rights Reserved.
12 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting How to build “Self Service” Operations © 2021 eureka, Inc. All Rights Reserved. Cognitive Load⬇ Operational Load⬇ Secure⬆ Auditable ⬆
13 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting © 2021 eureka, Inc. All Rights Reserved. 3 “Self-Service” Operations Examples in eureka 1.Infrastructure as Code(Terraform) Operation 2.Batch Container Operation 3.Incident Response Operation
14 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting © 2021 eureka, Inc. All Rights Reserved. 3 “Self-Service” Operations Examples in eureka 1.Infrastructure as Code(Terraform) Operation 2.Batch Container Operation 3.Incident Response Operation 👈
15 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting Overview © 2021 eureka, Inc. All Rights Reserved. • Provide IaC platform that allows developers to develop and operate infrastructure with Software Development Life Cycle (with Terraform)
16 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting Policy as Code with Conftest/Rego © 2021 eureka, Inc. All Rights Reserved. • Automatic review of semantics problems that cannot be covered by existing static analysis tools without relying on certain human review by introducing Policy as Code Operational Load⬇
17 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting User-friendly CI Notification © 2021 eureka, Inc. All Rights Reserved. • Notify users of the results of executing Terraform and conftest commands in CI in a form that is easy for them to understand what to change and how to change it • https://github.com/suzuki-shunsuke/tfcmt • https://github.com/suzuki-shunsuke/github-comment Cognitive Load⬇
18 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting Terraform/AWS Workshop for Developers © 2021 eureka, Inc. All Rights Reserved. • Held workshops to raise the knowledge level of Developers' Terraform and Cloud Infrastructure Cognitive Load⬇
19 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting © 2021 eureka, Inc. All Rights Reserved. 3 “Self-Service” Operations Examples in eureka 1.Infrastructure as Code(Terraform) Operation 2.Batch Container Operation 3.Incident Response Operation 👈
20 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting Overview © 2021 eureka, Inc. All Rights Reserved. • Provide batch container platform for developers with AWS Fargate + Amazon Eventbridge + AWS Lambda ◦ to manage batch schedule and infra computing resources with SDLC by adding simple parameters with Terraform ◦ to execute adhoc batch task by using GitHub Actions
21 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting ECS Fargate worker task auto scaler with AWS Lambda © 2021 eureka, Inc. All Rights Reserved. • Autoscaling based on current Fargate tasks and SQS depth ◦ Determine the number of tasks to execute based on the difference between the “Backlog (VisibleMsg Count)” and the “Appropriate-Backlog (currently running tasks x capacity per specified task)” • Eliminates the need for detailed capacity planning Operational Load⬇
22 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting Terraform module with few required parameters © 2021 eureka, Inc. All Rights Reserved. • Developers can easily deploy a resource by simply adding a minimum list of variables and calling it with a module • Developers can override CPU/Memory/Task Count and other parameters as needed Cognitive Load⬇
23 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting Adhoc batch task runner with GitHub Actions Workflow Dispatch © 2021 eureka, Inc. All Rights Reserved. • Validate if the user can execute the program by using the GitHub User ID (Team ID) at the first step of the job • Easily track history of who did what Secure⬆ Auditable ⬆
24 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting © 2021 eureka, Inc. All Rights Reserved. 3 “Self-Service” Operations Examples in eureka 1.Infrastructure as Code(Terraform) Operation 2.Batch Container Operation 3.Incident Response Operation 👈
25 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting Overview © 2021 eureka, Inc. All Rights Reserved. • Provide Incident Response platform with ChatOps interface to reduce the burden of response to incidents, shorten MTTR as much as possible, and complete Postmortems process
26 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting ChatOps to issue Incident ticket/channel © 2021 eureka, Inc. All Rights Reserved. • Integrate with Slack, which everyone is familiar with, and make it possible to report incidents with as simple commands and steps as possible Cognitive Load⬇
27 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting Add Incident Response flow to General On-boarding Process © 2021 eureka, Inc. All Rights Reserved. • Labor-saving and continuous recognition can be ensured by having the introduction of incident response flow incorporated in the onboarding process with BOT Cognitive Load⬇
28 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting Postmortem Template © 2021 eureka, Inc. All Rights Reserved. • Postmortems can be created from templates with one click of a button on Confluence Operational Load⬇
29 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting Future Prospects © 2021 eureka, Inc. All Rights Reserved. (Quoted from O’Reilly|Seeking SRE Chapter.4) Operational Load⬇ • Introduction of “Timeline Model” to automate incident response flow more • Measure time between “Response” and “Mitigate” and “Repair” and Analyse them to shorten MTTR
30 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting • Good “Self-Service” Operations are ◦ Low Cognitive Load ◦ Low Operational Load for “Users” ◦ Secure and Auditable Conclusion © 2021 eureka, Inc. All Rights Reserved.
31 © 2021 eureka, Inc. All Rights Reserved. CONFIDENTIAL INFORMATION:
Not for Public Distribution - Do Not Copy All Hands Meeting • Self-Service Operations • eurekaにおけるここ一年のTerraform Component Delivery Processの変化 急成長 していくProduct基盤のProductivity,Security,Privacyとの向き合い • Terraformのレビューを自動化するために、Conftestを導入してGitHub ActionsでCIま で設定してみる • Scaling based on Amazon SQS • Self-Serviceとサイロ化と組織構造 / Self-Service, Siloing and Organizational Structure • SRE を実現するための組織マネジメント / Management to achieve SRE • Seeking SRE • インシデントレスポンスを自動化で支援する Slack Bot で人機一体なセキュリティ対 策を実現する Reference © 2021 eureka, Inc. All Rights Reserved.
32 © 2021 eureka, Inc. All Rights Reserved.