Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Engineering

Data Engineering

Surasit Liangpornrattana

February 27, 2019
Tweet

More Decks by Surasit Liangpornrattana

Other Decks in Technology

Transcript

  1. WHERE ? I Et Et Et A I 1¥ Et

    I H¥ 㱺 o ¥T ' ' ¥1 it , - - ¥, I I ' ¥, I I Et Et Et ⾨, Et Et
  2. USE - CASE PAGES ? , " Y WHO ?

    INTEREST 'M are events ? - more . . . .
  3. CENTRALIZED LOGGING - n I f¥¥÷÷÷¥¥n¥¥i¥¥¥¥ .ES - I HE

    ET Eh €1 - I I I ⑤ ELUENTD I . INPUT # # ⑤ N - FILTER lTRALfZEfoasEf : :c:*:
  4. FLUENTD ACCESS Lode ¥e¥y ' Ip - - TIMESTAMP REQUEST

    USER - AGENT " # f) mess zooming WPI - MY .mn#EeHa I # LT * - T FILTER it - BUFFER 㱺 JSON - TRANSFORMS - J - FILTER - PERFORMANCE - L IP : . . . i - ENRICH - RELIABILITY ICHIKI TIMESTAMP : . . . , ICHUNKI REQUEST : . . . , - THREAD SAMY = ICHIKI USER - AGENT ' . . . . y LT L output # LT - E - WRITE OR SEND lodes - SYNC OR A- SYNC
  5. BUFFER - n ¥¥÷÷:÷¥¥¥¥i÷¥÷E¥ - = I - N -

    I -1 He HIT It ¥1 I 1 I 8- kafka bio * * * n - HIGH THROUGHPUT \ I / Eas - RE - PLAY • CENTRALIZE -1 o . - i¥ - FAULT TOLERANCE TO
  6. KAFKA ¥:* . . - EI BYTES OF LET SERIALIZED

    r JSON n n z seem :* :㱺i⾨EEf ' ⾨ 㱺 㱺 \TEAM_ RE - PLAY
  7. PROTOCOLS S SERIALIZATION - n I ¥¥¥¥÷i¥¥÷¥¥÷¥¥¥ - I HE

    HIT T €1 - Biao LOG STASH I l l - INPUT - KAFKA - FILTER # # # N - OUTPUT - PROTOBUF \ I / Ea¥ SITDTHARDTZEI Poo PROTOBUF # 1- 8oi.IE#EIfiIEl
  8. LOG STASH 1NPU ~ MELT BUFFER - n BYTES OF

    㱺 - PERFORMANCE SERIALIZED - RELIABILITY lPAG ← HEAD - THREAD SAMY IPA4t ← TALL JSON lPAat ← TAIL LT a ⊥ 㱺 Proto Boe FtLTER - TRANSFORM - FILTER to OUTPUT IT - ENRICH - - CODEC - WRITE OR SEND codes -
  9. PROTO BUE - SMALL → FAST - SIMPLE , KEY

    - VALUE - STRUCTURED DATA - SUPPORT MANY LANGUAGES n @04SER_VETf8gEoPR0ToButylBiBiaaa.L 09 STASH Elf III HII
  10. SCHEDULING JOBS - FOR BATCH DATA PROCESSING - n ¥¥¥÷i÷¥¥¥¥¥¥÷÷

    - = I - N - I -1 He HIT IT El I 1 I ¥B¥df: Airflow t.EE#EE:DPROT0B0F - WORKFLOW ⑤ ⑤ ⑤ n - SCHEDULER ) I / foas¥ ° Biao t PROTOBUF # - MONITORING - ¥7 LEI FI O
  11. p TASK 3 - I y ¥ TASK 't -

    Tasks - Task 3.2 - Task 4 foBE¥%)%§ - spark ← µfB← Motogp ⊥ TASK 3.3 at \ \ EEE. ¥¥*:m÷¥¥÷¥o¥ FE*o¥¥ha is ✓ I \ - IDEMPOTENT - f f - ¥21 ITASKTI Itasca - STATELESS - - - - PREFER INSERT s¥ TO UPDATE - PARTITION - BY TIME
  12. LOG SERVER . 1- - ¥÷i¥÷¥÷¥÷E - f i ¥1

    - N - 1- He - o.o EI ¥1 it I go kafka Et , i n - 㱺¥qs9D Egg elastic search Baa S # # # n Eiseman kibana \ If Eo o;:Ea¥aa* Poo Dinamo t PROTOBUF 1£ ¥iE¥÷÷÷¥¥÷ hadoop 80 - § : EsgEiE÷±¥÷⾨÷÷ ÷i¥a÷iE¥*¥* + 'E' IEEE .io#arE:*@eqT Elida
  13. LOG SERVER INPUT STREAMING SERVING OUTPUT FI FI - 87oi.es#auzeEFE-iIqEmEfE

    - Q FI & PROTOBUF BEEM DEFINED PROTO BUF I > \ / SCHEMA STORAGE gfqgg.EE#*EEqao:o:.I7oi '*¥w*㱺 a :i¥¥¥¥¥¥¥¥ :* :* :. HDFS t HIVE ( SERVING )
  14. DATA PIPELINE → esparto BATCH T - T I ¥¥¥¥i¥¥¥¥÷¥¥¥

    .in#s-I/Et 1¥ = # ACQUISITION 1- je PROCESSING SINGESTLON ¥1 i €1 ⾨ I 1 I I÷*s÷s * * * n ACQUISITION ) I / STORAGE Poo Dinamo t PROTOBUF 1£ S O - *⾨÷÷oEaao¥n TO jog cEEozgEE¥i¥¥÷¥ ÷a:÷:*÷÷** . ACCESS STREAMING ⊥ Eigg PROCESSING E%Biaa SINGES -110N
  15. DATA LAKE N GATEWAY ~ ¥sER) - # t E'

    Ex 9*7 IN EE ESSE i÷a¥** - + 'Eh IEE.EE.ge#EE' ⾨ 1*973 tEaa Sum
  16. QSA