Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Talk on Database (ja)

Talk on Database (ja)

筑波大学の2013年度の情報システム特別講義Dのスライド

Avatar for UENISHI Kota

UENISHI Kota

January 31, 2014

More Decks by UENISHI Kota

Other Decks in Technology

Transcript

  1. ͲͪΒ༷Ͱ͔͢ʁ •  @kuenishi •  NTTݚڀॴ ˠ Basho •  Erlang/OTPΛॻ͍ͯ࢓ࣄΛ͍ͯ͠·͢ • 

    ෼ࢄσʔλϕʔεɾ෼ࢄγεςϜͷ঎༻։ൃྺ 6೥ •  ϛυϧ΢ΣΞతͳ΋ͷ͕ൺֱతಘҙͰ͢
  2. ྺ্࢙ͷσʔλϕʔε •  IDS (GE, 1963) •  ֊૚ܕDB •  IMS (IBM,

    1966 -), •  ωοτϫʔΫܕDB •  CODASYL (1969, COBOL), •  ࠓ΋ಈ͍͍ͯΔ੡඼΋͋Δ 8
  3. 1977೥ System R (IBM) •  ؔ܎୅਺ͷཧ࿦ʹج͍ͮͨσʔλૢ࡞ݴޠ SQLΛ࣮૷ͨ͠ੈքॳͷσʔλϕʔε •  Ұ؏ͨ͠σʔλʹର͢Δෳ਺ͷૢ࡞Λશͯ ੒ޭɾશࣦͯഊʹ·ͱΊɺෳ਺ͷߋ৽ཁٻ

    Λฒߦ੍ޚ͠ɺσʔλΛӬଓԽ͢Δτϥϯ βΫγϣϯॲཧΛ࣮૷ͨ͠ੈքॳͷσʔλ ϕʔε •  ۀ຿ॲཧͷϞσϧʹඇৗʹΑ͘Ϛονͨ͠ ͨΊɺരൃతʹීٴɹˠISOͰඪ४Խ •  1983೥ Oracle v3 •  1987೥ Sybase SQL Server (ݱMicrosoft SQL Server) 9
  4. Web Scale΁ͷΞϓϩʔν •  Ոఉ༻ίϯϐϡʔλͱಉ͡HW •  LAMP + Memcached(2003) •  Redis

    (2009) •  Google: GFS(2003), BigTable(2006) •  Hadoop (2006), HBase (2006) •  Amazon: Dynamo(2006) •  Cassandra (2008), Riak (2009) 14
  5. 18

  6. 21 Locality vs Load Balancing - Use Cases • ઌಡΈΛޮ͔ͤͯόϧΫͰγʔέϯγϟϧΞ

    Ϋηε • - HBase, BigTable, etc • ϋογϡͰ෼ࢄͤͯ͞ෛՙ෼ࢄͯ͠ϥϯμϜ ΞΫηε • - Riak, Cassie, etc
  7. ੔߹ੑʹ͸2ͭͷจ຺͕͋Δ •  ڞ௨͢Δͷ͸ʮ୭͕ݟͯ΋ಉ͡Α͏ʹݟ͑Δ͜ͱʯ •  ෳ਺छྨͷσʔλؒͷInvariant͕कΒΕ͍ͯΔ͜ͱ •  “ACID” తͳAnomaly͕ൃੜ͠ͳ͍͜ͱ •  Phantom

    Read, Write Skew, etc etc… •  ෳ਺ͷσʔλͷίϐʔ͕ಉ͡Ͱ͋Δ͜ͱ •  ෳ਺ͷίϐʔΛ҆શʹߋ৽Ͱ͖Δ͜ͱ •  ผʑͷਓͰ΋ಉ͡σʔλΛݟΕ͍ͯΔ͜ͱ
  8. ނোతࠔΔ͜ͱͷ෼ྨ Crash Failure ɹ (fail-stop, fail-safe) ނোͨ͠Βࢭ·Δɺࢭ·ͬͨ͜ͱ͕͢ ͙ʹ෼͔Δ Crash Failure

    (fail-silent) ໧ͬͯࢮ͵ Crash Recovery ނোͨ͠ϑϦͯ͠஌ΒΜ΀Γͯ͠ؼͬͯ ͘Δ Omission Failure (receive / send) ϝοηʔδܽམ Timing Failure ·ͱ΋ͳ଎౓Ͱಈ͔ͳ͘ͳΔ Response Failure Ϩεϙϯε͕͓͔͍͠ʢσʔλ͕յΕͯ ͍Δetcʣ Arbitrary Failure Ϗβϯνϯނোɺѱҙͷ͋ΔϠπ͕͍Δ 31
  9. 34 What is PAXOS? – Example   Two phase election

    – phase 1  Larger n is prior 21:32 8ß: ¼ ÛËÙêîĝ 21:36 O2Ě 21:35 [_ÕðJK 21:36 O2ÙÐÄorz 21:38 O2ÙÐÞorz n=3 n=1 n=2 8
  10. 35 What is PAXOS? – Example   Two phase election

    – phase 2  confirmation 21:42 Ãğ 21:41 O2ÙÀÀ ëÞĝĚ 21:43 a~Å… 21:44 Ãğ 21:43 GJ proposer 21:43 ÀêO2ÚyÁËÚ Ù|N`çÜßÙ n=4 9
  11. ͱΓ͋͑ͣॻ͍͓͍ͯͯ ॻ͖ࠐΈ͕িಥͨ͠Βʁ •  ྆ํ͓͍࣋ͬͯͯɺিಥͨ͠σʔ λͷʮҼՌؔ܎ʯΛ໌Β͔ʹ͢Δ •  “Vector Clock” •  {a:

    1, b:2, c:1} ͱ {a: 2, b:2, c:1} •  {a: 1, b:2, c:1} ͱ {a: 1, c:1, d:1} •  িಥͨ͠΋ͷ͸ΞϓϦͰղܾ 43 w1 w2 r
  12. 44 CRDT • CRDT (Conflict-Free Replicated Data Types • “AP”

    Λαϙʔτ • Counter, Register, Sets, Maps • →ผεϥΠυʁ
  13. 45 CRDTͰͷRead • ॱ൪͕ೖΕସΘͬͯ΋݁Ռ͕มΘΒͳ͍ܕ • update(w1, update(w2, Data0) = update(w2,

    update(w1, Data0) = Data w1 w1 w1 w2 w2 w2 Actor 0 Actor 1 Actor 2 w1(w2(Data0)) => Data w1(w2(Data0)) => Data w2(w1(Data0)) => Data
  14. ACID vs BASE ACID BASE ੔߹͍ͯ͠ͳͯ͘΋ ৗʹσʔλʹΞΫη εͰ͖Δ Basically Avaiable

    Atomicity ෳ਺ͷૢ࡞ͷ੒ޭɾ ࣦഊΛ·ͱΊΔ Consistency ৑௕Խ͞Εͨσʔλ ΍ෳ਺ͷϦϨʔγϣ ϯ͕ৗʹ੔߹͍ͯ͠ Δ ࠷ऴతʹ੔߹ͨ͠ঢ় ଶʹͳΔ͜ͱ͕อূ ͞Ε͍ͯΕ͹Α͍ Eventually Consistent Isolation ฒߦ؅ཧͯ͠ߋ৽్ தͷঢ়ଶΛݟͤͳ͍ Durability σʔλΛӬଓԽͯ͠ ࣦΘͳ͍ ϨϓϦΧ͸ܾఆ࿦త Ͱ͸ͳ͘ɺ֬཰తͰ ͋ͬͨΓɺάϩʔό ϧʹҰ؏͍ͯ͠ͳ͘ ͯ΋Α͍ Soft-state 46
  15. ΫΠζɿͲ͜·Ͱ͕DB? Ͳ͔͜Β͕NoSQL? ACIDͳτϥϯβΫγϣϯ τϥϯβΫγϣϯͳ͠ SQLΠϯλʔϑ Σʔε Oracle MySQL PostgreSQL SQL

    Server, DB2 Cassandra (CQL) Hive, Presto Impala ಠࣗAPI InnoDB BerkeleyDB FoundationDB Riak MongoDB Redis 48
  16. CAPఆཧ͔ΒΈͨ੡඼ͷ෼ྨ •  CAॏࢹ •  RDBMS, MongoDB, HBase, etc… •  ੔߹ੑҡ࣋ͷͨΊͷ෼ࢄ߹ҙʢਖ਼͘͠ϑΣΠϧΦʔ

    όʔ͢ΔͨΊʣͷ࢓૊Έ͕ඞཁ •  APॏࢹ •  Cassandra, Riak, CouchDB •  ෼ࢄ߹ҙͷ࢓૊Έ͕ෆཁʹ࣮૷΋ӡ༻΋؆୯ 50
  17. ϦϨʔγϣφϧϞσϧΛఘΊΔ 52 •  ςʔϒϧߏ଄ => KVS (Mapߏ଄) •  Pkey͸ΧϥϜͷͻͱ͔ͭΒબͿ΋ͷ=> ඞਢͷ΋ͷ

    •  JOINΛͤ͞ͳ͍ •  ͜ΕʹΑΓεέʔϧΞ΢τܕͷ෼ࢄ͕Մೳʹ •  εΩʔϚΛࣄલఆٛ͠ͳ͍
  18. εΩʔϚ: σʔλϕʔεͷσʔλߏ଄ΛܾΊΔ΋ͷ •  εΩʔϚΛࣄલʹఆٛ: ϦϨʔγϣφϧϞσϧʹԊͬͨ࢖͍ํ •  Pros: σʔλߏ଄ʹڧྗͳ੍໿Λ՝ͨ͢Ί࠷దԽ͠΍͍͢ˍΞϓϦ Λ։ൃ͠΍͍͢ • 

    Cons: ΞϓϦͷมߋίετ͕ߴ͍ˍ࠷ॳʹద੾ʹઃܭ͠ͳ͍ͱ͔ͳ Γมߋʹऑ͍ •  εΩʔϚΛࣄલʹܾΊΔඞཁ͕ͳ͍৔߹ •  Pro: σʔλߏ଄Λޙ͔ΒͰ΋͔ͳΓॊೈʹมߋͰ͖Δ •  Con: ςετ͕ͳ͍ͱόά͕ग़΍͍͢ɺσʔλߏ଄͕ෳࡶͩͱ࠷ద Խ͠ʹ͍͘ 53
  19. ৽͍͠σʔλϞσϧ ΧϥϜࢦ޲ •  ΧϥϜΛ͍͘ΒͰ΋૿΍ ͢͜ͱ͕Ͱ͖Δ •  Query LanguageΛ࡞Δ •  ʮΧϥϜϑΝϛϦʔʯ

    •  HBase, Cassandra υΩϡϝϯτࢦ޲ •  ݸʑͷϨίʔυ͕ࣗ༝ͳ ܗࣜΛ࣋ͭ •  MapReduceͰΫΤϦ •  XML, JSON, etc … •  CouchDB, MongoDB 54
  20. 58

  21. ͍··Ͱઆ໌͖ͯͨ͠ͷͰ؆୯ ໰୊ Riak͸Ͳ͏͍ͯ͠Δ͔ εέʔϦϯά ϋογϡతϧʔςΟϯάͰGossipͳਫฏ෼ࢄ ੔߹ੑͷ؅ཧ Vector Clocks, CRDTͳͲRead࣌ʹղܾ 2.0Ͱ͸Paxos΋ೖΔ

    Մ༻ੑͷอূ Hinted HandoffΛ࢖ͬͯৗʹॻ͖ࠐΈ͕Ͱ͖ΔΑ͏ ʹ͍ͯ͠Δ σʔλϞσϧ Blob + Document Based ΠϯσοΫεΛ͸ΕΔ MapReduce͕Ͱ͖Δ ॲཧܥ Erlang VM 59
  22. ࣮૷໘ͰͷRiakͷಛ௃ •  ʮ໷தʹΞϥʔτͰى͜͞Εͳ͍ʯ͜ͱΛ໨ࢦͨ͠ •  Erlang VMΛ࢖༻ •  GCʹΑΔఀࢭ͕࣌ؒͳ͍ •  ແఀࢭͰͷύονద༻ɺղੳɺૢ࡞͕Մೳ

    •  ίϚϯυମܥɾϑΝΠϧߏ଄ɾσʔλߏ଄ΛͳΔ΂͘γϯϓϧʹઃ ܭ͠ӡ༻ෛ୲Λ௿ݮ •  ҆ఆੑΛॏࢹͨ͠ઃܭ •  ίϯύΫγϣϯɺ༧ଌՄೳੑ 60
  23. ࣮૷໘΋ߟྀͨ͠ ੡඼બ୒ͷϙΠϯτ •  JOINͳͲσʔλͷਖ਼نԽ͕ඞཁͳͱ͖: RDBMS •  ΦϯϝϞϦͰ࠷৽ͷ࣌ܥྻσʔλΛݟ͍ͨͱ͖: MongoDB •  ΧϥϜΛ͍͘ΒͰ΋૿΍͍ͨ͠ͱ͖

    •  SQL෩ͷΫΤϦΛ͍ͨ͠ͱ͖ or Մ༻ੑ: Cassandra •  େྔσʔλʹޮ཰తʹόονॲཧΛ͍ͨ͠ͱ͖: Hbase •  σʔλΛͳ͘͞ͳ͍&μ΢ϯλΠϜΛͳ͍ͨ͘͠ͱ͖: Riak 62