Upgrade to Pro — share decks privately, control downloads, hide ads and more …

位置情報を使ったサービス「スマポ」をPostgreSQLで作ってみた db tech show...

位置情報を使ったサービス「スマポ」をPostgreSQLで作ってみた db tech showcase 2013 Tokyo

Avatar for Yoshiyuki Asaba

Yoshiyuki Asaba

November 13, 2013
Tweet

More Decks by Yoshiyuki Asaba

Other Decks in Technology

Transcript

  1. αʔόߏ੒ • AWSΛར༻ • Instance • m1.large * 2 (master/slave,

    availability zoneΛ෼͚͍ͯΔ) • ELB→App Servers→DB Servers • EBS • provisioned IOPS volume (IOPS=1000) • ෼ੳ؀ڥ͸ผͷσʔληϯλʔʹ഑ஔ
  2. ྫɿFILLFACTOR=90 • ͬ͘͘͢͟͝Γͨ͠આ໌Ͱ͢ 8KB 8KB … 8KB 8KB PageHeader lineptr1(lp)

    lp2 lp4 lp3 … … lpN SpecialSpace tuple1 tuple2 tuple3 tuple4 … tupleN FreeSpace (90%௒͑ͨΒ࣍ͷϖʔδ)
  3. STREAMING REPLICATION • PostgreSQL 9.0͔Βಋೖ͞Εͨػೳ • SLAVE͸Read-onlyͳDBͱͯ͠ಈ͔͢͜ͱ΋Մೳ • hot standby

    wal_level = hot_standby max_wal_senders = 5 wal_keep_segments = 16 hot_standby = on max_standby_streaming_delay = 90s standby_mode = 'on' primary_conninfo = 'host=x.x.x.x port=5432 user=repl_user password=XXXXXXX' restore_command = 'envdir /etc/wal-e.d/env /usr/local/bin/wal-e wal-fetch "%f" "%p"' masterͷ postgresql.conf slaveͷpostgresql.conf slaveͷrecovery.conf
  4. archive_mode = on archive_command = 'envdir /etc/wal-e.d/ env /usr/local/bin/wal-e wal-push

    %p' WAL-E • https://github.com/wal-e/wal-e standby_mode = 'on' restore_command = 'envdir /etc/wal-e.d/ env /usr/local/bin/wal-e wal-fetch "%f" "%p"' postgresql.conf recovery.conf Amazon S3 PostgreSQL (master) PostgreSQL (slave) PostgreSQL (analytics) backup-push backup-fetch wal-push wal-fetch
  5. ϑΝΠϧͷ৔ॴ test=# SELECT relname, current_setting('data_directory') || '/' || pg_relation_filepath(oid) as

    filepath, pg_relation_size(oid) as filesize FROM pg_class WHERE relname = 'sample_table'; ! relname | filepath | filesize --------------+-----------------------------------------------+---------- sample_table | /var/lib/postgresql/9.2/test/base/16385/18572 | 1351680 (1 row) test=# SELECT relname, current_setting('data_directory') || '/' || pg_relation_filepath(oid) as filepath, pg_relation_size(oid) as filesize FROM pg_class WHERE oid IN (SELECT indexrelid FROM pg_index WHERE indrelid = (SELECT oid FROM pg_class WHERE relname = ‘sample_table')); ! relname | filepath | filesize --------------------+----------------------------------------------- +---------- sample_table_pkey | /var/lib/postgresql/9.2/test/base/16385/22203 | 16384 idx_sample_table_b | /var/lib/postgresql/9.2/test/base/16385/22205 | 16384 idx_sample_table_c | /var/lib/postgresql/9.2/test/base/16385/22206 | 16384
  6. ࢀরͷνϡʔχϯά • EXPLAIN ANALYZEͰͲͷ࣮ߦϓϥϯ͕஗͍͔֬ೝ • table scan͕஗͍ • index͕ແ͍ or

    index͕ෆద੾ʁ • table join͕஗͍ • ANALYZE͕଍Γͳ͍͔֬ೝ • limit͕۟͋Δ৔߹͸subqueryʹͯ͠JOINͷճ਺ΛݮΒͤΔ͔ࢼ͢
  7. EXPLAIN ANALYZE • ࣮ߦܭը͕ͦΕͧΕͲΕ͘Β͍͔͔͔࣌ؒͬͨݟΔ͜ ͱ͕Մೳ test=# EXPLAIN ANALYZE SELECT t1.a,

    t2.a FROM t1, t2 WHERE t1.a = t2.a and t1.a < 100; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.85..805.31 rows=95 width=8) (actual time=0.038..3.319 rows=99 loops=1) -> Index Only Scan using t1_pkey on t1 (cost=0.42..10.09 rows=95 width=4) (actual time=0.012..0.449 rows=99 loops=1) Index Cond: (a < 100) Heap Fetches: 99 -> Index Only Scan using idx_t2_a on t2 (cost=0.42..8.36 rows=1 width=4) (actual time=0.006..0.011 rows=1 loops=99) Index Cond: (a = t1.a) Heap Fetches: 99 Total runtime: 3.782 ms (8 rows) ϓϥϯϊʔυ ίετ ࣮ߦ࣌ؒ
  8. ΍ͬͨ͜ͱ • ࢖ͬͯ΄͍͠ΠϯσοΫε͕࢖ΘΕ͍ͯͳ͍ͷͰΫΤ ϦΛॻ͖׵͑ • ΠϯσοΫεͳ͍΋ͷ͸௥Ճ • JOINͨ͋͠ͱʹLIMIT͍ͯͨ͠Օॴ͸ɺઌʹLIMIT͔ͯ͠ ΒJOIN͢ΔΑ͏ʹมߋ SELECT

    *FROM a, b WHERE a.id = b.id and a.hoge >= 1000 LIMIT10; ↓ SELECT* FROM (SELECT * FROM a WHERE a.hoge >= 1000 LIMIT10) as aa, b WHERE aa.id = b.id
  9. PostgreSQL 9.1→9.2 PostGIS 1.5→2.0 tool down time comment pg_upgrade ˚

    postgisͷupgradeʹ ରԠ͍ͯ͠ͳ͍ʁ pg_dump/ pg_restore ☓ ఀࢭ͕࣌ؒ௕͍ slony-I ̋ ໘౗
  10. SELECT WAITING • DB migration࣌ͳͲʹϩοΫ଴͕ͪൃੜ • psίϚϯυ • ps aux

    | grep postgres | grep waiting • SELECT * FROM pg_locks; • ΋͏গ͠ࡉ͔͘ϩοΫͷ֫ಘঢ়گΛ֬ೝͰ͖Δ
  11. ΫΤϦΩϟϯηϧ • ΫΤϦΛΩϟϯηϧ • SELECT pg_cancel_backend(pid); • or kill -INT

    pid • ΫΤϦΛΩϟϯηϧͯ͠όοΫΤϯυϓϩηεΛམͱ͢ • SELECT pg_terminate_backend(pid) • or kill -TERM pid
  12. ࣮ࡍʹى͖ͨέʔεʢ؆ུ൛ʣ 2013-10-30 00:11:22 JST DETAIL: Process 3225 waits for ShareLock

    on transaction 11759339; blocked by process 3040. Process 3040 waits for ShareLock on transaction 11759337; blocked by process 3225. Process 3225: insert into foo (x,y,z) select x, 12345, z from bar where id in (( values (1), (2), (3) ) except ( select x from foo where date = '...')) ! Process 3040: insert into foo (x,y,z) select x, 12345, z from bar where id in (( values (2), (3), (1) ) except( select x from foo where date = '...')) !
  13. ௐࠪ • લఏɿ։ൃ؀ڥͰ΍Δ͜ͱ • postgresql.conf • log_statement = all •

    log_line_prefixʹ%pΛ͚ͭΔ • COMMIT௚લͷpg_locksΛ֬ೝ
  14. BACK TRACE (gdb) bt … #5 0x00000000006f0ada in LockAcquire (locktag=0x7fff7ce28d20,

    lockmode=5, sessionLock=0 '\000', dontWait=0 '\000') at lock.c:662 #6 0x00000000006effec in XactLockTableWait (xid=768) at lmgr.c:495 #7 0x00000000004890e4 in _bt_doinsert (rel=0x7fa0c1634838, itup=0x2122468, checkUnique=UNIQUE_CHECK_YES, heapRel=0x7fa0c162f820) at nbtinsert.c:168 #8 0x000000000048f3b4 in btinsert (fcinfo=0x7fff7ce28e40) at nbtree.c:257 #9 0x0000000000819bb5 in FunctionCall6Coll (flinfo=0x2115650, collation=0, arg1=140328416004152, arg2=140735288611488, arg3=140735288611840, arg4=34743148, arg5=140328415983648, arg6=1) at fmgr.c:1439 #10 0x0000000000487c64 in index_insert (indexRelation=0x7fa0c1634838, values=0x7fff7ce292a0, isnull=0x7fff7ce29400 "", heap_t_ctid=0x212236c, heapRelation=0x7fa0c162f820, checkUnique=UNIQUE_CHECK_YES) at indexam.c:216 #11 0x00000000005f29aa in ExecInsertIndexTuples (slot=0x21167c0, tupleid=0x212236c, estate=0x2115e60) at execUtils.c:1087 #12 0x0000000000605273 in ExecInsert (slot=0x21167c0, planSlot=0x21167c0, estate=0x2115e60, canSetTag=1 '\001') at nodeModifyTable.c:248
  15. create table tt(a text unique); select pg_backend_pid(); begin; insert into

    tt values (‘aa’), (‘bb’) select pg_backend_pid(); begin; insert into tt values (‘bb’), (‘aa’) (gdb) b _bt_doinsert Breakpoint 1 at 0x488fc7: file nbtinsert.c, line 106. (gdb) c Continuing. ! Breakpoint 1, _bt_doinsert (….) at nbtinsert.c:106 (gdb) c Continuing. ! Breakpoint 1, _bt_doinsert (….) at nbtinsert.c:106 ‘aa’Λinsertͨ͠ͱ͜ΖͰࢭΊΔ ! (gdb) c (gdb) b _bt_doinsert Breakpoint 1 at 0x488fc7: file nbtinsert.c, line 106. (gdb) c Continuing. ! Breakpoint 1, _bt_doinsert (….) at nbtinsert.c:106 (gdb) c Continuing. ! Breakpoint 1, _bt_doinsert (….) at nbtinsert.c:106 ‘bb’Λinsertͨ͠ͱ͜ΖͰࢭΊΔ ! (gdb) c psql psql gdb gdb
  16. ճආํ๏ 2013-10-30 00:11:22 JST DETAIL: Process 3225 waits for ShareLock

    on transaction 11759339; blocked by process 3040. Process 3040 waits for ShareLock on transaction 11759337; blocked by process 3225. Process 3225: insert into foo (x,y,z) select x, 12345, z from bar where x in (( values (1), (2), (3) ) except ( select x from foo where date = ‘…’)) ORDER BY x ! Process 3040: insert into foo (x,y,z) select x, 12345, z from bar where x in (( values (2), (3), (1) ) except( select x from foo where date = ‘...')) ORDER BY x !
  17. AWS • instanceΛscale up • m1.large → m3.2xlarge • app

    server૿ڧ • Elastic LoadBalancerͷpre-warming
  18. PERFORMANCE TEST • pgbenchΛ࢖༻ • tps(transaction per second)Λ஌Δ͜ͱ͕Ͱ͖Δ • -f

    ࣮ߦ͍ͨ͠SQLϑΝΠϧ • -c ฒྻ਺ • -t τϥϯβΫγϣϯ਺ • New Relic/CloudWatchͰύϑΥʔϚϯεΛ؂ࢹ
  19. postgresql.conf • memory • shared_buffer=౥ࡌϝϞϦͷ1/4ఔ౓ • wal_buffer=16MB • checkpoint •

    checkpoint_segments = 64 • checkpoint_timeout = 1h • checkpoint_completion_target = 0.9 ! • planner • effective_cache_size=page cacheͷαΠ ζ • random_page_cost=2.0 • lock • deadlock_timeout = 10s