Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to Dev&Ops Internal PaaS
Search
taichi nakashima
June 29, 2015
22
4.2k
How to Dev&Ops Internal PaaS
Talked at
http://www.zusaar.com/event/9057007
taichi nakashima
June 29, 2015
Tweet
Share
More Decks by taichi nakashima
See All by taichi nakashima
Platform Engineering at Mercari (Platform Engineering Kaigi 2024)
tcnksm
5
3.7k
Platform Engineering at Mercari
tcnksm
8
4.9k
Embedded SRE at Mercari
tcnksm
0
1.4k
How We Harden Platform Security at Mercari
tcnksm
2
1.7k
SRE Practices in Mercari Microservices
tcnksm
11
12k
開発者向けの基盤をつくる
tcnksm
38
12k
How We Structure Our Work At Mercari Microservices Platform Team
tcnksm
11
23k
Microservices Platform on Kubernetes at Mercari
tcnksm
16
16k
Introduction to Mercari Micorservices Platform Team
tcnksm
5
3.5k
Featured
See All Featured
Producing Creativity
orderedlist
PRO
343
39k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
59k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
226
22k
Six Lessons from altMBA
skipperchong
27
3.6k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
8
1.2k
GraphQLの誤解/rethinking-graphql
sonatard
68
10k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
120k
GitHub's CSS Performance
jonrohan
1030
460k
Speed Design
sergeychernyshev
25
740
Thoughts on Productivity
jonyablonski
68
4.4k
Adopting Sorbet at Scale
ufuk
74
9.2k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
160
15k
Transcript
HOW TO DEV&OPS INTERNAL PAAS
TAICHI NAKASHIMA @deeeet @tcnksm
INTERNAL PAAS? = PaaS for Rakuten engineers
ONLY FOR TEST? = No. It receives production requests
WHY PAAS? = Fast app experimentation and iteration with PROD-grade
WHY PAAS? = You don’t need to prepare servers by
yourself
WHY PAAS? = You don’t need to provision servers by
yourself
WHY PAAS? = You don’t need to prepare DBs by
yourself
WHY PAAS? = You can scale your app by *one
command*
WHY PAAS? = You can focus on development, not deployment
WHY INTERNAL PAAS? = Easy to connect with other internal
service
WHY INTERNAL PAAS? = Instant support when something happen
WHY INTERNAL PAAS? (From organizational point of view) = You
can reduce duplicated tooling by different teams
HOW LARGE? How many request? servers? language?
16000 req/sec. All application requests
2500 instances 1400 (PROD) + 700 (STG) + 400 (DEV)
4300 VMs 2800 (PROD) + 1200 (STG) + 300 (DEV)
+300 VMs/mon. Growth forecasting
4 languages support Ruby, Node.js, Java, PHP
3 DB services Redis, MongoDB, Clustrix
100 Redis clusters 230 Instances
40 components Components (Roles) to run PaaS
320 chef recipes `ls cookbooks/*/recipes | wc -l`
8 Engineers Dev & Ops, From 7 Countries
HOW TO DEV&OPS INTERNAL PAAS
HOW TO DEV&OPS INTERNAL PAAS
None
Router API Health Check Messaging DBs Apps
DEV FLOW RELEASE FLOW
DEV FLOW RELEASE FLOW
Create Ticket on JIRA Write code Write Chef cookbook Test
on LAB Create PR (Git-Flow) Review
DEV FLOW RELEASE FLOW
Assign release manager Collect all JIRA tickets Write internal blog
CanaryRelease Release
1 release for 1 week DEV (2day) , STG (2day)
, PROD(3day)
HOW TO RELEASE? = Chef + Capistrano
RELEASE 1 SERVER
Service-out Run Chef solo Run Serverspec Service-in
Stop Load-Balancing Disable Health Check Stop monit Service-out Run Chef
solo Run Serverspec Service-in Start monit Enable Health Check Start Load-Balancing
/etc/service-out /etc/service-in Service-out Run Chef solo Run Serverspec Service-in
Every server has same startup/stop scripts = workflow is same
= automation is easy
RELEASE X SERVERS
cap service-in cap service-out cap setup-role Service-out X servers Run
Chef solo X servers Run Serverspec X servers Service-in X servers
Role A Role B Role C Operation 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA
170.20.20.24.RoleA 170.20.20.25.RoleA 170.20.20.26.RoleA 170.20.20.27.RoleA VMLIST
cap service-out 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA 170.20.20.24.RoleA 170.20.20.25.RoleA 170.20.20.26.RoleA 170.20.20.27.RoleA VMLIST
Operation Role A Role B Role C Parallel execution
cap setup-role Operation Parallel execution 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA 170.20.20.24.RoleA 170.20.20.25.RoleA
170.20.20.26.RoleA 170.20.20.27.RoleA VMLIST Role A Role B Role C
cap service-in Role A Role B Role C Operation 170.20.20.21.RoleA
170.20.20.22.RoleA 170.20.20.23.RoleA 170.20.20.24.RoleA 170.20.20.25.RoleA 170.20.20.26.RoleA 170.20.20.27.RoleA VMLIST Parallel execution
cap service-out Operation Parallel execution 170.20.20.31.RoleB 170.20.20.32.RoleB 170.20.20.33.RoleB 170.20.20.34.RoleB 170.20.20.35.RoleB
170.20.20.36.RoleB 170.20.20.37.RoleB VMLIST Role A Role B Role C
cap service-out 170.20.20.21.RoleA VMLIST Operation Role A Role B Role
C Start from Canary
HOW TO DEV&OPS INTERNAL PAAS
LOGGING MONITORING ALERT HANDLING SUPPORT IAAS
LOGGING MONITORING ALERT HANDLING SUPPORT IAAS
700GB/day logs All logs produced in PaaS
LOGGING IN PAAS? = Application logs + Component logs
APPLICATION LOG ? = PaaS should provide user the way
to debug
Instant logs Midterm logs Longterm logs Real time 1-2 weeks
- 6 month
Router API Health Check Messaging DBs Apps Instant log
Log Server Apps Object Storage Instant log Midterm log Longterm
log
Log Server Apps Instant log Midterm log Hadoop (BigData team)
Analytics
Log Server Apps Instant log Midterm log Splunk Dashboard
COMPONENT LOG ? = Log which we use for debug
PaaS itself
Log Server Object Storage
Log Server Object Storage We can debug CF here
Log Server Object Storage GlusterFS LeoFS
Log Server Object Storage GlusterFS
LOGGING METRICS ALERT HANDLING SUPPORT IAAS
OpenTSDB, Pandra FMS
LOGGING METRICS ALERT HANDLING SUPPORT IAAS
1 week, 24H charge Primary & Sub admin
✉
2500 ✉/day MAX. Need to fix…
LOGGING METRICS ALERT HANDLING SUPPORT IAAS
JIRA, HipChat Instant support is one of *good* point of
Internal PaaS
LOGGING METRICS ALERT HANDLING SUPPORT IAAS
IAAS Operating PaaS also means operating IaaS
vSphere
HOW TO BOOT SERVERS? = Internal tool like terraform
Role A vSphere Operation rvc create -c rvc.yml 170.20.21.RoleA RoleA:
cpu: 2 mem: 8192 rvc.yml 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA VMLIST
Role A vSphere Operation VMLIST rvc create -c rvc.yml 170.20.21.RoleA
RoleA: cpu: 2 mem: 8192 rvc.yml 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA VMLIST
Role A vSphere Operation rvc create -c rvc.yml 170.20.21.RoleA RoleA:
cpu: 2 mem: 8192 rvc.yml 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA VMLIST
Role A vSphere Operation rvc create -c rvc.yml 170.20.22.RoleA RoleA:
cpu: 2 mem: 8192 rvc.yml 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA VMLIST
Role A vSphere Operation rvc create -c rvc.yml 170.20.23.RoleA RoleA:
cpu: 2 mem: 8192 rvc.yml 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA VMLIST
cap setup-role Role A Operation vSphere 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA VMLIST
cap setup-role Role A Operation vSphere 170.20.20.21.RoleA 170.20.20.22.RoleA 170.20.20.23.RoleA VMLIST
Easy to boot & setup servers = If there is
*physical resource*
FUTURE? = We are moving to *version 2*
BE GOPHER CloudFoundry moves from Ruby to Golang
NO FORK Everything goes to upstream
BE OPEN Building tool as OSS
✉
NO MORE TOO MUCH ✉ Planing to use Pagerduty +
Riemann
Log Server Object Storage GlusterFS LeoFS
Object Storage LeoFS Kafka
MORE FLEXIBLE LOG STACK Planning to use Apache Kafka
NEW METRICS STACK Planning to use InfluxDB + Grafana
CONTAINER Planning to support Docker
MORE HA Planning to have a ChaosMonkey
NEW IAAS Migrating to OpenStack
NEW IAAS Planning to Hybrid Cloud
WE HAVE MANY CHALLENGES
WE ARE HIRING http://corp.rakuten.co.jp/careers/experienced/
@deeeet