Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Treasure Data Summer Intern 2015 Final Report

mururu
September 30, 2015
3.1k

Treasure Data Summer Intern 2015 Final Report

mururu

September 30, 2015
Tweet

Transcript

  1. Who am I? • Yuki Ito • Master’s student, Information

    Science and Technology, The University of Tokyo • msgpack-erlang and fluent-logger-erlang maintainer 
  2. Problem • Some storages and platforms (Elasticsearch, GCP…) expect/want sub-second

    (millisecond, nanosecond) as timestamp. • But current Fluentd timestamp cannot hold sub-second. • If we support nanosecond resolution, it will be able to cover all requirements because it is minimum resolution generally. (but there will be a little more overhead than millisecond/microsecond) 
  3. Implementation - EventTime • Two attributes • @sec: second integer

    (same with current Timestamp) • @nsec: nanosecond integer • In most cases, it behaves just like current Timestamp • It is serialized as MessagePack Ext type • Fluent::Engine.now and built-in parsers returns EventTime 
  4. Backward compatibility 1 • There are many plugins. They must

    works fine just as it is. • EventTime behaves like current Timestamp in most cases. • If a plugin doesn’t want to loose sub-second, it may need additional code to handle EventTime. • I have checked many many plugins. 
  5. Backward compatibility 2 • To keep sub-second resolution across nodes

    (forward plugins), external data format (MessagePack) have to serialize EventTime as a different type from old timestamp, but this may breaks old nodes. • Introduced time_as_integer option to output forward plugin to force timestamp to be serialized as Integer (same with the current timestamp). 
  6. Performance concerns 1 • When time_as_integer is true and output

    forward plugin receives PackedForward, it is deserialized and serialized for converting EventTime to Integer. • By keeping source nodes old or keeping time_as_integer true on them, we can set false to time_as_integer on relay node. 
  7. Performance concerns 2 • If timestamp format of logs include

    sub- second part, it is difficult to cache results of Time.strptime (Time.strptime is heavy task). • Introduced strptime gem, which can precompile a format string. 
  8. Benchmark Results - strptime • Measured in_tail parsing performance •

    Used dummer/flowcounter_simple • machine spec • CPU: Core i5 5250U • Memory: 8GB • Disc: SSD 256G • OS: OSX Yosemite DBDIF OPDBDIF OPDBDIF XJUITUSQUJNF JO@UBJM MJOFTTFD       
  9. Summary • Current Timestamp have only second resolution. • Some

    storages and platforms want nano(milli)second resolution. • I introduced new Timestamp which can have nanosecond resolution called EventTime. 
  10. Future work • Documentations for users and plugin developers. •

    “This branch will be merged in master branch about next month.” by Fluentd commiters 
  11. What for? - Perfect Monitor • Visualize used computing resources

    for customers in near real-time • e.g. number of records/bytes over event collector
 number of running jobs
 number of allocated CPU cores for a particular job • Reduce support cost by making customers to understand how they are using our computing resources • Make TD staffs known what/how our customers are doing data processing 
  12. What is? - Perfect Monitor • Dashboard for customers to

    know how they are using our computing resources • Collector for various metrics sent from workers • Storages to store metrics (InfluxDB/TD) • API server to handle requests from dashboard and query backend storage • Dashboard Application 
  13. Use Cases 1 - number of records -  •

    Almost system administrator doesn't know about how many logs they are generating. • The number of logs are affected by many events, like release of new services, new versions of apps, and so on.
  14. 

  15. Use Cases 2 - number of running tasks - 

    • Customers doesn’t know how many are CPU cores is used by their each jobs now.
  16. Use Cases 3 - support/sales team side -  •

    A support/sales engineer found the cause of problems more easily.
  17. .POJUPSJOH
 4FSWFS )BEPPQ 8PSLFS 1SFTUP 8PSLFS "1* 4FSWFS 5% 1SFTUP

    *OqVY%# "1* 4FSWFS 3FEJT %BTICPBSE send metrics 6TFS query cache  Architecture query (only old data)
  18. .POJUPSJOH
 4FSWFS )BEPPQ 8PSLFS 1SFTUP 8PSLFS "1* 4FSWFS 5% 1SFTUP

    *OqVY%# send metrics  How to collect metrics • Workers send metrics to monitoring Server. • Monitoring Server filters and aggregates (if needed) metrics, then store them to InfluxDB and TD.
  19. 5% 1SFTUP *OqVY%# "1* 4FSWFS 3FEJT %BTICPBSE query (only old

    data) query cache  How to show metrics • Dashboard asks API Server for metrics based on its configuration. • API Server queries InfluxDB or TD(Presto) based on the time window. • Dashboard renders graph for query results.
  20. .POJUPSJOH
 4FSWFS )BEPPQ 8PSLFS 1SFTUP 8PSLFS "1* 4FSWFS 5% 1SFTUP

    *OqVY%# "1* 4FSWFS 3FEJT %BTICPBSE send metrics 6TFS query cache  The Point of Architecture 1 query (only old data)
  21. The Point of Architecture 1 • Perfect Monitor store same

    data to InfluxDB and TD(presto) • InfluxDB hold only recent data(e.g. 30 days) • TD(presto) hold all data • Why I chose InfluxDB? • It is fast enough. • We can write time-series data in a hash and query it on the fly flexibly, so it makes trial and error easy.  5% 1SFTUP *OqVY%# query query store store
  22. .POJUPSJOH
 4FSWFS )BEPPQ 8PSLFS 1SFTUP 8PSLFS "1* 4FSWFS 5% 1SFTUP

    *OqVY%# "1* 4FSWFS 3FEJT %BTICPBSE send metrics 6TFS query cache  The Point of Architecture 2 query (only old data)
  23. The Point of Architecture 2 • API server interprets queries

    from dashboard and queries InfluxDB or TD(presto) • Only when old data is needed, API Server queries TD(presto), otherwise InfluxDB • It can cache query results in redis.  "1* 4FSWFS 3FEJT query cache query (only old data)
  24. Failure case 1 • InfluxDB tends to be overloaded •

    When InfluxDB is down, just launch new InfluxDB node and load data from TD(presto).  5% 1SFTUP OFX*OqVY%# restore
  25. Failure case 2 • When InfluxDB is down or its

    response is delayed, API Server query TD(presto).  5% 1SFTUP *OqVY%# "1* 4FSWFS query
  26. For operation • The API server returns results even if

    happening InfluxDB degradation. • Because then API server uses TD(Presto) automatically. • To add new metrics, you just need to add configuration to monitoring server and dashboard application. • Of course, workers need to be able to send the metrics. 
  27. Summary • Show more metrics about computing resources to customers

    for customers and TD. • Perfect Monitor makes it easy. 
  28. Impression of Summer Intern • I have thought about “data“

    seriously everyday. This is my first experience. So excited. • I have learned a lot by trying both of OSS and TD internal task.