Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Talk on Database (ja)

Talk on Database (ja)

筑波大学の2013年度の情報システム特別講義Dのスライド

Avatar for UENISHI Kota

UENISHI Kota

January 31, 2014
Tweet

More Decks by UENISHI Kota

Other Decks in Technology

Transcript

  1. ͲͪΒ༷Ͱ͔͢ʁ •  @kuenishi •  NTTݚڀॴ ˠ Basho •  Erlang/OTPΛॻ͍ͯ࢓ࣄΛ͍ͯ͠·͢ • 

    ෼ࢄσʔλϕʔεɾ෼ࢄγεςϜͷ঎༻։ൃྺ 6೥ •  ϛυϧ΢ΣΞతͳ΋ͷ͕ൺֱతಘҙͰ͢
  2. ྺ্࢙ͷσʔλϕʔε •  IDS (GE, 1963) •  ֊૚ܕDB •  IMS (IBM,

    1966 -), •  ωοτϫʔΫܕDB •  CODASYL (1969, COBOL), •  ࠓ΋ಈ͍͍ͯΔ੡඼΋͋Δ 8
  3. 1977೥ System R (IBM) •  ؔ܎୅਺ͷཧ࿦ʹج͍ͮͨσʔλૢ࡞ݴޠ SQLΛ࣮૷ͨ͠ੈքॳͷσʔλϕʔε •  Ұ؏ͨ͠σʔλʹର͢Δෳ਺ͷૢ࡞Λશͯ ੒ޭɾશࣦͯഊʹ·ͱΊɺෳ਺ͷߋ৽ཁٻ

    Λฒߦ੍ޚ͠ɺσʔλΛӬଓԽ͢Δτϥϯ βΫγϣϯॲཧΛ࣮૷ͨ͠ੈքॳͷσʔλ ϕʔε •  ۀ຿ॲཧͷϞσϧʹඇৗʹΑ͘Ϛονͨ͠ ͨΊɺരൃతʹීٴɹˠISOͰඪ४Խ •  1983೥ Oracle v3 •  1987೥ Sybase SQL Server (ݱMicrosoft SQL Server) 9
  4. Web Scale΁ͷΞϓϩʔν •  Ոఉ༻ίϯϐϡʔλͱಉ͡HW •  LAMP + Memcached(2003) •  Redis

    (2009) •  Google: GFS(2003), BigTable(2006) •  Hadoop (2006), HBase (2006) •  Amazon: Dynamo(2006) •  Cassandra (2008), Riak (2009) 14
  5. 18

  6. 21 Locality vs Load Balancing - Use Cases • ઌಡΈΛޮ͔ͤͯόϧΫͰγʔέϯγϟϧΞ

    Ϋηε • - HBase, BigTable, etc • ϋογϡͰ෼ࢄͤͯ͞ෛՙ෼ࢄͯ͠ϥϯμϜ ΞΫηε • - Riak, Cassie, etc
  7. ੔߹ੑʹ͸2ͭͷจ຺͕͋Δ •  ڞ௨͢Δͷ͸ʮ୭͕ݟͯ΋ಉ͡Α͏ʹݟ͑Δ͜ͱʯ •  ෳ਺छྨͷσʔλؒͷInvariant͕कΒΕ͍ͯΔ͜ͱ •  “ACID” తͳAnomaly͕ൃੜ͠ͳ͍͜ͱ •  Phantom

    Read, Write Skew, etc etc… •  ෳ਺ͷσʔλͷίϐʔ͕ಉ͡Ͱ͋Δ͜ͱ •  ෳ਺ͷίϐʔΛ҆શʹߋ৽Ͱ͖Δ͜ͱ •  ผʑͷਓͰ΋ಉ͡σʔλΛݟΕ͍ͯΔ͜ͱ
  8. ނোతࠔΔ͜ͱͷ෼ྨ Crash Failure ɹ (fail-stop, fail-safe) ނোͨ͠Βࢭ·Δɺࢭ·ͬͨ͜ͱ͕͢ ͙ʹ෼͔Δ Crash Failure

    (fail-silent) ໧ͬͯࢮ͵ Crash Recovery ނোͨ͠ϑϦͯ͠஌ΒΜ΀Γͯ͠ؼͬͯ ͘Δ Omission Failure (receive / send) ϝοηʔδܽམ Timing Failure ·ͱ΋ͳ଎౓Ͱಈ͔ͳ͘ͳΔ Response Failure Ϩεϙϯε͕͓͔͍͠ʢσʔλ͕յΕͯ ͍Δetcʣ Arbitrary Failure Ϗβϯνϯނোɺѱҙͷ͋ΔϠπ͕͍Δ 31
  9. 34 What is PAXOS? – Example   Two phase election

    – phase 1  Larger n is prior 21:32 8ß: ¼ ÛËÙêîĝ 21:36 O2Ě 21:35 [_ÕðJK 21:36 O2ÙÐÄorz 21:38 O2ÙÐÞorz n=3 n=1 n=2 8
  10. 35 What is PAXOS? – Example   Two phase election

    – phase 2  confirmation 21:42 Ãğ 21:41 O2ÙÀÀ ëÞĝĚ 21:43 a~Å… 21:44 Ãğ 21:43 GJ proposer 21:43 ÀêO2ÚyÁËÚ Ù|N`çÜßÙ n=4 9
  11. ͱΓ͋͑ͣॻ͍͓͍ͯͯ ॻ͖ࠐΈ͕িಥͨ͠Βʁ •  ྆ํ͓͍࣋ͬͯͯɺিಥͨ͠σʔ λͷʮҼՌؔ܎ʯΛ໌Β͔ʹ͢Δ •  “Vector Clock” •  {a:

    1, b:2, c:1} ͱ {a: 2, b:2, c:1} •  {a: 1, b:2, c:1} ͱ {a: 1, c:1, d:1} •  িಥͨ͠΋ͷ͸ΞϓϦͰղܾ 43 w1 w2 r
  12. 44 CRDT • CRDT (Conflict-Free Replicated Data Types • “AP”

    Λαϙʔτ • Counter, Register, Sets, Maps • →ผεϥΠυʁ
  13. 45 CRDTͰͷRead • ॱ൪͕ೖΕସΘͬͯ΋݁Ռ͕มΘΒͳ͍ܕ • update(w1, update(w2, Data0) = update(w2,

    update(w1, Data0) = Data w1 w1 w1 w2 w2 w2 Actor 0 Actor 1 Actor 2 w1(w2(Data0)) => Data w1(w2(Data0)) => Data w2(w1(Data0)) => Data
  14. ACID vs BASE ACID BASE ੔߹͍ͯ͠ͳͯ͘΋ ৗʹσʔλʹΞΫη εͰ͖Δ Basically Avaiable

    Atomicity ෳ਺ͷૢ࡞ͷ੒ޭɾ ࣦഊΛ·ͱΊΔ Consistency ৑௕Խ͞Εͨσʔλ ΍ෳ਺ͷϦϨʔγϣ ϯ͕ৗʹ੔߹͍ͯ͠ Δ ࠷ऴతʹ੔߹ͨ͠ঢ় ଶʹͳΔ͜ͱ͕อূ ͞Ε͍ͯΕ͹Α͍ Eventually Consistent Isolation ฒߦ؅ཧͯ͠ߋ৽్ தͷঢ়ଶΛݟͤͳ͍ Durability σʔλΛӬଓԽͯ͠ ࣦΘͳ͍ ϨϓϦΧ͸ܾఆ࿦త Ͱ͸ͳ͘ɺ֬཰తͰ ͋ͬͨΓɺάϩʔό ϧʹҰ؏͍ͯ͠ͳ͘ ͯ΋Α͍ Soft-state 46
  15. ΫΠζɿͲ͜·Ͱ͕DB? Ͳ͔͜Β͕NoSQL? ACIDͳτϥϯβΫγϣϯ τϥϯβΫγϣϯͳ͠ SQLΠϯλʔϑ Σʔε Oracle MySQL PostgreSQL SQL

    Server, DB2 Cassandra (CQL) Hive, Presto Impala ಠࣗAPI InnoDB BerkeleyDB FoundationDB Riak MongoDB Redis 48
  16. CAPఆཧ͔ΒΈͨ੡඼ͷ෼ྨ •  CAॏࢹ •  RDBMS, MongoDB, HBase, etc… •  ੔߹ੑҡ࣋ͷͨΊͷ෼ࢄ߹ҙʢਖ਼͘͠ϑΣΠϧΦʔ

    όʔ͢ΔͨΊʣͷ࢓૊Έ͕ඞཁ •  APॏࢹ •  Cassandra, Riak, CouchDB •  ෼ࢄ߹ҙͷ࢓૊Έ͕ෆཁʹ࣮૷΋ӡ༻΋؆୯ 50
  17. ϦϨʔγϣφϧϞσϧΛఘΊΔ 52 •  ςʔϒϧߏ଄ => KVS (Mapߏ଄) •  Pkey͸ΧϥϜͷͻͱ͔ͭΒબͿ΋ͷ=> ඞਢͷ΋ͷ

    •  JOINΛͤ͞ͳ͍ •  ͜ΕʹΑΓεέʔϧΞ΢τܕͷ෼ࢄ͕Մೳʹ •  εΩʔϚΛࣄલఆٛ͠ͳ͍
  18. εΩʔϚ: σʔλϕʔεͷσʔλߏ଄ΛܾΊΔ΋ͷ •  εΩʔϚΛࣄલʹఆٛ: ϦϨʔγϣφϧϞσϧʹԊͬͨ࢖͍ํ •  Pros: σʔλߏ଄ʹڧྗͳ੍໿Λ՝ͨ͢Ί࠷దԽ͠΍͍͢ˍΞϓϦ Λ։ൃ͠΍͍͢ • 

    Cons: ΞϓϦͷมߋίετ͕ߴ͍ˍ࠷ॳʹద੾ʹઃܭ͠ͳ͍ͱ͔ͳ Γมߋʹऑ͍ •  εΩʔϚΛࣄલʹܾΊΔඞཁ͕ͳ͍৔߹ •  Pro: σʔλߏ଄Λޙ͔ΒͰ΋͔ͳΓॊೈʹมߋͰ͖Δ •  Con: ςετ͕ͳ͍ͱόά͕ग़΍͍͢ɺσʔλߏ଄͕ෳࡶͩͱ࠷ద Խ͠ʹ͍͘ 53
  19. ৽͍͠σʔλϞσϧ ΧϥϜࢦ޲ •  ΧϥϜΛ͍͘ΒͰ΋૿΍ ͢͜ͱ͕Ͱ͖Δ •  Query LanguageΛ࡞Δ •  ʮΧϥϜϑΝϛϦʔʯ

    •  HBase, Cassandra υΩϡϝϯτࢦ޲ •  ݸʑͷϨίʔυ͕ࣗ༝ͳ ܗࣜΛ࣋ͭ •  MapReduceͰΫΤϦ •  XML, JSON, etc … •  CouchDB, MongoDB 54
  20. 58

  21. ͍··Ͱઆ໌͖ͯͨ͠ͷͰ؆୯ ໰୊ Riak͸Ͳ͏͍ͯ͠Δ͔ εέʔϦϯά ϋογϡతϧʔςΟϯάͰGossipͳਫฏ෼ࢄ ੔߹ੑͷ؅ཧ Vector Clocks, CRDTͳͲRead࣌ʹղܾ 2.0Ͱ͸Paxos΋ೖΔ

    Մ༻ੑͷอূ Hinted HandoffΛ࢖ͬͯৗʹॻ͖ࠐΈ͕Ͱ͖ΔΑ͏ ʹ͍ͯ͠Δ σʔλϞσϧ Blob + Document Based ΠϯσοΫεΛ͸ΕΔ MapReduce͕Ͱ͖Δ ॲཧܥ Erlang VM 59
  22. ࣮૷໘ͰͷRiakͷಛ௃ •  ʮ໷தʹΞϥʔτͰى͜͞Εͳ͍ʯ͜ͱΛ໨ࢦͨ͠ •  Erlang VMΛ࢖༻ •  GCʹΑΔఀࢭ͕࣌ؒͳ͍ •  ແఀࢭͰͷύονద༻ɺղੳɺૢ࡞͕Մೳ

    •  ίϚϯυମܥɾϑΝΠϧߏ଄ɾσʔλߏ଄ΛͳΔ΂͘γϯϓϧʹઃ ܭ͠ӡ༻ෛ୲Λ௿ݮ •  ҆ఆੑΛॏࢹͨ͠ઃܭ •  ίϯύΫγϣϯɺ༧ଌՄೳੑ 60
  23. ࣮૷໘΋ߟྀͨ͠ ੡඼બ୒ͷϙΠϯτ •  JOINͳͲσʔλͷਖ਼نԽ͕ඞཁͳͱ͖: RDBMS •  ΦϯϝϞϦͰ࠷৽ͷ࣌ܥྻσʔλΛݟ͍ͨͱ͖: MongoDB •  ΧϥϜΛ͍͘ΒͰ΋૿΍͍ͨ͠ͱ͖

    •  SQL෩ͷΫΤϦΛ͍ͨ͠ͱ͖ or Մ༻ੑ: Cassandra •  େྔσʔλʹޮ཰తʹόονॲཧΛ͍ͨ͠ͱ͖: Hbase •  σʔλΛͳ͘͞ͳ͍&μ΢ϯλΠϜΛͳ͍ͨ͘͠ͱ͖: Riak 62