Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Application Logging with Elasticsearch at Naver

Elastic Co
December 12, 2017

Application Logging with Elasticsearch at Naver

Jaeik Lee | Lead In-house Log Management Platform | Naver

Elastic Co

December 12, 2017
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. 2 Agenda Introduction to NELO 1 Problem of multi-tenant logging

    platform Our Solution: Multi-cluster & Multi-indices per day How NELO was using Elasticsearch 2 3 4
  2. 4 Introduction to NELO • SDK/Agent: Log forwarding for various

    platforms • Collect/Search/Analyze Logs • Real-time/Scheduled alerts • Crash log symbolication • Own Webapp + Kibana(Dashboard) • OpenAPI for custom data processing Features
  3. 5 Introduction to NELO Architecture 수집서버 클러스터 모바일/데스크 톱 애플리케이션

    SDK Collector Server N Thrift Collector Server 2 Collector Server 1 Collector Server 3 Queue Sink 분산큐 Webapp & Kibana 메타 정보 DB N E L O O P E N A P I Filter Convert Syslog K A F K A 시스템 로그 전송 오픈소스 로깅 에이전트 HTTP HTTPS 심볼리케이터 알람서버 크래시집계 검색/분석 서버
  4. 7 Node 7 instances 9 clusters Scale Documents Total number

    of logs Size Total size of logs 388 2630B 627T
  5. 8 Index Model • 1 Index per day → daily

    index lifecycle management • Various retention time according to the instances (1 M, 3M, 2Y, 5Y) • Type for project → mapping variance per project Time-based model nelo2-2017-08-19 nelo2-2017-08-20 … nelo2-2017-09-18
  6. 9 Custom Routing • Use custom routing both in index

    & search ‒ Small project: store only in one shard (custom routing: project name) ‒ Big project: distribute logs over all shards (default routing) Depends on project size nelo2-2015-09-18 0 1 2 3 4 5 6 7 8 9 Client Client Client
  7. 10 Hot-Warm Architecture • Recent data in SSD Search HDD

    Data Node Web UI (Search Query) Indexer (Index Query) Client Node SSD Data Node Master Nodes
  8. 13 Mapping Explosion Stopping cluster due to update mapping [2017-05-30

    21:36:57,773][WARN ][cluster.service ] [elastic09.nelo2] cluster state update task [put-mapping [naver-project],put-mapping [naver-project]] took 5.1m above the warn threshold of 30s
  9. 15 Shard Size Distribution Skewed shards due to routing 0

    20 40 60 80 100 120 0 5 10 15 20 25 30 35 40
  10. 17 Tribe • A federated client across multiple clusters •

    Limits ‒ Cannot handle indices with the same name in multiple clusters ‒ Master level write operations are not allowed. • Will be replaced with cross cluster search Introduction Client Tribe Node Cluster A Cluster B
  11. 18 Tribe Sample cluster.name: es-tribe tribe: es1: cluster.name: es1 discovery.zen.ping.unicast.hosts:

    ['10.3.8.76'] es2: cluster.name: es2 discovery.zen.ping.unicast.hosts: ['10.3.8.75']
  12. 19 Tribe In NELO Cold Cluster Webapp Indexer Tribe Nodes

    Hot Cluster HDFS index search Search hot data Search cold data snapshot restore master nodes master nodes
  13. 20 Index Model Change • Time based index model +

    project based index model ‒ Same policy for daily index creation ‒ For big projects, split indices ‒ For small projects, share index Introduction nelo2-2017-09-18 nelo2-2017-09-18 nelo2-2017-09-18-naverapp nelo2-2017-09-18-line nelo2-2017-09-18-band
  14. 21 Index Model Change • Use aliases no matter a

    project is indiced either in common or own index. ‒ Alias name: <project name>-yyyyMMdd ‒ Index name: nelo2-log-yyyy-MM-dd-<project name> Aliases
  15. 22 Index Model Change • Shard Count ‒ Estimated Index

    Size / Shard Size Threshold ‒ Estimated Index Size: average values of past logs ‒ Shard Size Threshold: configured (by test) Shard size estimation
  16. 24 After changing model Shard size distribution 0 2 4

    6 8 10 12 0 50 100 150 200 0 20 40 60 80 100 120 0 10 20 30 40