company. All rights reserved. ǀ SAP Concur In a typical day 2.4M expense receipts uploaded 275K trips booked 700K mobile logins $187M invoices processed
company. All rights reserved. ǀ SAP Concur DevOps Journey Break down monolith § Decompose to set of services E2E teams § Dedicated team owns each service § Team is enabled and responsible for each aspect of the service § Self service Cloud services enabling E2E ownership § Logging, Alerting, Monitoring, Analytics (LAMA) services to enable Run and Monitoring Customers Auditors Go to Market E2E Teams E2E Teams E2E Teams E2E teams E2E Teams E2E Teams E2E Teams E2E teams E2E Teams E2E Teams E2E Teams E2E teams Limited centralized ownership Delivery Pipeline Security QE Cloud Services Production Environments
company. All rights reserved. ǀ Logging Service Journey Start Small Logs storage Build Trust Reliable logging as a service solution LAMA as Strategy Enable DevOps culture and End to End Ownership Future Machine Learning for Ops Insights and Alerting Machine Learning for intelligent canarying
company. All rights reserved. ǀ Features § XML logs, logs storage § Web UI with limited capabilities to search logs Architecture § RMQ as ingestion pipeline § SQL Server as logs storage § Homebrew ASP-based Web UI for logs search Usage § Travel and Expense Dev teams § In peak hours 1500 logs/sec, 200GB data per day § Soon faced scalability and performance issues Why Elasticsearch § Time to explore new solution -> Elasticsearch § High performing and horizontally scalable solution § Successful stories and good feedback from the community Pre-Logging Service
company. All rights reserved. ǀ Logging Service v1 Features § JSON logs, Kibana3 to search and visualize logs in dashboards § Homebrew Watcher and Watcher UI Architecture § RMQ, Logstash, FluentD as ingestion pipeline § Elasticsearch 1.1 - 1.7 § Watcher solution based on Chronos, Mesos, Zookeeper Usage § More Dev teams started to adopt the service § In peak hours 5 000 docs/sec, 0.8TB per day in peak days Why to Upgrade § Watcher solution is complex § Elastic Watcher in 2.x Visualization Distributed data storage Data-in pipeline Logging Service cluster JSON JSON JSON fluentd Apache Mesos Apache Zookeeper Chronos Concur Watcher cluster
company. All rights reserved. ǀ Logging Service v2 Features § Elastic Gold license § Elastic Watcher, homebrew Watcher UI § Shield for enterprise security § Aggregations UI Architecture § RMQ, Logstash, Beats as ingestion pipeline § Elasticsearch 2.3, Shield 2.3, Watcher 2.3 § Kibana 4.5 Usage § Most of Dev teams use logging service § In peak hours, up to 60 000 docs/sec, up to 4TB per day § Logging Service considered dial-tone service, core infra service § TV screens across Concur offices display Ops dashboards § Watcher widely used Why to Upgrade § Many new and fancy features § Improved security, GDPR § Cross-cluster search
company. All rights reserved. ǀ Logging Service v5 Architecture Ingestion Pipeline Elasticsearch 5.6 6.1 X-cluster search Kibana - TV only REST API Elasticsearch 5.6 Elasticsearch 5.6 Kibana - users Data Storage X-cluster Search Data View
company. All rights reserved. ǀ • defined by users, changed any time • customizations • owned by team/application Examples: • my_app_data.{fields} • mm_data.{fields} • just_test_data.{fields} • almost never change • particular data type Examples: • kubernetes.{fields} • exception_obj.{fields} • nginx.{fields} • mobile.{fields} § never change § used mainly for data correlation § applied on all data Examples: • application • roletype • host • geo Logging Service v5 Data Mapping Root Level Fields Shared Buckets Custom Buckets
company. All rights reserved. ǀ § 10 instances, in each Concur Cloud environment – both AWS and private data centers – clusters across 3 or 4 AZ for HA, up to 35 primary, 1 replica shards § Concur US datacenter, largest logging service – in peak hours up to 60k - 80k docs/s, up to 5TB per day – 3 clusters, 130 data nodes in total, x-cluster search cluster § Data types (retention) – application logs and metrics (6 weeks) – infrastructure logs (2 weeks) – GDPR audit logs (13 months) § Kibana and Watcher widely used across the company – E2E teams, SRE teams, Customer Support, Executive Leadership – Kibana dashboard only for TV screens and Ops analysis § Authentication and Authorization with SAML 2.0 (OKTA) and Enterprise security Logging Service v5 Usage
company. All rights reserved. ǀ Logging Service v5 Zero-touch Production Deployments and Ops GitHub as the Only Source of Truth § Consistency across environments. Same branch, same code gets deployed into each Concur environment. § Team knowledge share. Versions control. Code driven rollouts and rollbacks. Infrastructure as a Code. Terraform. Configuration as a Code. Operations as a Code. Ansible. Codified Deployment Pipelines. Jenkins. § Fully Automated Canary Deployments Monitoring and Alerting as a Code. ”Batteries Included”. § Logging Services get deployed fully equipped with monitoring and alerting § Elastic Beats deployment fully automated. Kibana dashboards and Watcher definitions as a code, and deployed as part of logging services deployment pipelines
company. All rights reserved. ǀ Maintenance Operations as a Code § Ops repeated more than two-three times are codified and automated § Maintenance playbooks for services restarts, upgrades, configurations § Playbooks trigger from Watcher watches web hook actions § Chaos engineering Logging Service v5 Zero-touch Production Ops
company. All rights reserved. ǀ Infrastructure § Elastic as a Service for use cases other than logging and monitoring § Elastic Cloud Enterprise evaluation and pilot End User Experience § Data Rollups. Hot-warm-cold data. § Intelligence with Machine Learning – automated ops insights and alerting – automated intelligent rollouts and rollbacks § “Batteries Included” . GitOps. – reusable Concur global delivery pipeline workflow libraries – automated Beats configurations and deployments – automated and codified export, import, deployment of Kibana objects – automated and codified export, import, deployment of Watcher objects Logging Service Future