Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LINE ShopチームでのPrometheus/Grafana/Zipkin/Elastic...

LINE ShopチームでのPrometheus/Grafana/Zipkin/Elasticsearch/Kibanaを使ったサービスモニタリング / Service monitoring with Prometheus,Grafana,Zipkin,Elasticsearch,Kibana at LINE Shop team

2019/4/17にLINE Fukuokaで開催されたLINE Developer meetup #52での登壇資料です

LINE Developers

April 17, 2019
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. LINE ShopνʔϜͰͷ
 Prometheus/Grafana/Zipkin/ Elasticsearch/Kibana
 Λ࢖ͬͨαʔϏεϞχλϦϯά 2019/04/17 LINE Developer Meetup in

    Fukuoka #52 (https://line.connpass.com/event/126705/) LINE Fukuokaגࣜձࣾ ։ൃ1ࣨ দ࡚ ֶ
  2. About me @matsumana LINE Fukuoka Corp, Development 1 Dept SRE/Server

    side Engineer https://github.com/matsumana Manabu Matsuzaki
  3. • LINE ShopαʔϏε঺հ • Introduction to Armeria • Integration with

    Prometheus • Integration with Zipkin • How do we monitor our services with … • Prometheus/Grafana • Zipkin • Elasticsearch/Kibana Agenda
  4. LINE Shopͱ͸ʁ • LINEαʔϏεʹ͓͚Δɺελϯϓɾֆจࣈɾண͔ͤ͑ͳͲͷίϯςϯπΛ
 ൢചɺར༻͢ΔͨΊͷϓϥοτϑΥʔϜ • LINEΞϓϦ಺ͷελϯϓγϣοϓɺண͔ͤ͑γϣοϓ • WebͷLINE STORE

    (https://store.line.me/) • ໿490ສηοτͷLINEελϯϓΛൃചத ʢ2019/04ݱࡏʣ • 1೔͋ͨΓͷελϯϓૹ৴਺͸ฏۉ4ԯ3,300ສճ ʢ2019/04ݱࡏʣ
  5. LINE Shop ΞʔΩςΫνϟ • LINE DEVELOPER DAY 2018 ϙελʔηογϣϯ
 ʮԶͷߟ͑ͨ࠷ڧͷϚΠΫϩαʔϏε

    - LINE Shop ͷࣄྫΛఴ͑ͯʯ
 https://twitter.com/LINE_DEV/status/1073068507707789313
  6. Armeria is an open-source asynchronous HTTP/2 RPC/REST client/server library built

    on top of Java 8, Netty, Thrift and gRPC. Its primary goal is to help engineers build high-performance asynchronous microservices that use HTTP/2 as a session layer protocol. https://line.github.io/armeria/
  7. Is there any open-source project
 using Armeria to refer? •

    https://github.com/line/armeria/issues/1709 • OSS • OpenZipkin Server • Curiostack • Services • LINE • Slack • Kakao Pay • Infostellar
  8. Related features for monitoring • Collect metrics with Micrometer and

    Prometheus • Distributed tracing with Zipkin
  9. Armeria server’s metrics • Requests • total (success, fail) •

    latency for each API (quantile: 0.5, 0.75, 0.9, 0.95, 0.98, 0.99, 0.99, 1.0) • request size • response size • Connection from client • Active requests • Pending requests • Logback (trace, debug, info, warn, error) • etc
  10. Armeria client’s metrics • Requests • total (success, fail) •

    latency for each API (quantile: 0.5, 0.75, 0.9, 0.95, 0.98, 0.99, 0.99, 1.0) • Circuit breaker • etc
  11. public final class Frontend { public static void main(String[] args)

    { final Tracing tracing = TracingFactory.create("frontend"); final HttpClient backendClient = new HttpClientBuilder("http://localhost:9000/") .decorator(HttpTracingClient.newDecorator(tracing, "backend")) .build(); final Server server = new ServerBuilder() .http(8081) .service("/", (ctx, res) -> backendClient.get("/api")) .decorator(HttpTracingService.newDecorator(tracing)) .decorator(LoggingService.newDecorator()) .build(); server.start().join(); } } https://github.com/openzipkin-contrib/zipkin-armeria-example/blob/master/src/main/java/armeria/Frontend.java Sample code from zipkin-armeria-example
  12. public final class Backend { public static void main(String[] args)

    { final Tracing tracing = TracingFactory.create("backend"); final Server server = new ServerBuilder() .http(9000) .service("/api", (ctx, res) -> HttpResponse.of(new Date().toString())) .decorator(HttpTracingService.newDecorator(tracing)) .decorator(LoggingService.newDecorator()) .build(); server.start().join(); } } https://github.com/openzipkin-contrib/zipkin-armeria-example/blob/master/src/main/java/armeria/Backend.java Sample code from zipkin-armeria-example
  13. • Monitor metrics of OS/Middlewares/Applications • Prometheus + Grafana •

    Investigate which microservices are getting slow/failure • Zipkin • Confirm log • Elasticsearch + Kibana • Reporting (Preliminary report value) • Elasticsearch + Kibana
  14. Visualize with Grafana
 (OS metrics by node_exporter) • Load average

    • CPU usage (system, user, I/O wait) • Context switches • Memory usage (memory, slab, swap) • Disk usage • Network traffic (inbound, outbound) • etc
  15. Visualize with Grafana (JVM metrics) • GC • Pause time

    (Young, Old) • Pause count (Young, Old) • Memory (heap) • used, committed, max (Eden, Survivor, Old) • Memory (non-heap) • used, committed, max (Metaspace, Code cache) • Thread • Thread count, Daemon thread count • ClassLoader • Loaded classes count, Unloaded classes count • etc
  16. Visualize with Grafana (Armeria metrics) • Requests • total (success,

    fail) • latency for each API (quantile: 0.5, 0.75, 0.9, 0.95, 0.98, 0.99, 0.99, 1.0) • request size • response size • Connection from client • Active requests • Pending requests • Logback (trace, debug, info, warn, error) • etc
  17. Visualize with Grafana (Cache metrics) • Local cache (Caffeine), Redis

    for cache storage • Operation count (get, put) • Hit rate • size • load latency (quantile: 0.5, 0.75, 0.9, 0.95, 0.98, 0.99, 0.99, 1.0)
  18. Visualize with Grafana (Redis client) • Command count • GET

    • HGET • HMGET • SET • HSET • HMSET • ZRANGE • etc
  19. Visualize with Grafana (MongoDB client) • Requests • total (success,

    fail) • latency for each requests (quantile: 0.5, 0.75, 0.9, 0.95, 0.98, 0.99, 0.99, 1.0)
  20. Visualize with Grafana (Elasticsearch client) • Requests • total (success,

    fail) • latency for each requests (quantile: 0.5, 0.75, 0.9, 0.95, 0.98, 0.99, 0.99, 1.0)
  21. Collected data • Application logs (via Logback) • User operation

    logs • Product search • Search keyword • Product browsing • product, type, country, gender, age • Product purchase event (Preliminary report value) • product, type, country, gender, age • sales (compare to yesterday, last week, last month) • Elasticsearch slow logs (collect with Fluentd) • etc