Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lessons Learned the Hard Way: Postgres in Produ...

Lessons Learned the Hard Way: Postgres in Production at GoCardless (PGConf US Edition)

**New content in this edition: better answers for incident 5**

From the beginning, GoCardless has used Postgres for the stuff that matters. It’s the primary store behind our APIs and dashboards.

After 5 years, it’s fair to say we’ve seen some things. From spikes in latency as resources run out, through the weird contortions ORMs put your database through, and on to the perils of running a high-availability cluster - this talk has it all.

In the talk, we’ll run through some war stories from the last two years at GoCardless. By gathering them together, we’ll spot some recurring themes, and hopefully avoid fighting the same battles again!

Chris Sinjakli

March 29, 2017
Tweet

More Decks by Chris Sinjakli

Other Decks in Programming

Transcript

  1. -- Slow query (AccessShare) SELECT DISTINCT(customer_id) FROM payments; -- Forces

    this to queue (AccessExclusive) ALTER TABLE payments ADD COLUMN refunded boolean;
  2. -- Slow query (AccessShare) SELECT DISTINCT(customer_id) FROM payments; -- Forces

    this to queue (AccessExclusive) ALTER TABLE payments ADD COLUMN refunded boolean; -- Which blocks these (AccessShare) SELECT * FROM payments WHERE id = 123;
  3. … LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp23325.11166", size 1245184 LOG: temporary

    file: path "base/pgsql_tmp/pgsql_tmp23325.11165", size 8675328 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp10729.11129", size 47734784 …
  4. SET temp_file_limit = -1; # disable CREATE INDEX pay_merch ON

    payments (merchant_id); RESET temp_file_limit; # re-enable
  5. SET

  6. SET client_encoding TO "utf8"; SET client_min_messages TO "panic"; SET standard_conforming_strings

    = on; SET client_min_messages TO "warning"; SET time zone "UTC";
  7. SET client_encoding TO "utf8"; SET client_min_messages TO "panic"; SET standard_conforming_strings

    = on; SET client_min_messages TO "warning"; SET time zone "UTC";
  8. $ git show a456acb2f2 commit a456acb2f2af8365eb9151c7cd2d5a10c189d191 Author: Harry Marr <[email protected]>

    Date: Wed Oct 28 16:30:02 2015 +0000 Avoid disabling postgres errors # Enable standard-conforming strings if available. def set_standard_conforming_strings - old, self.client_min_messages = client_min_messages, 'panic' - execute('SET standard_conforming_strings = on', 'SCHEMA') rescue nil - ensure - self.client_min_messages = old + execute(<<-SQL, 'SCHEMA') + UPDATE pg_settings + SET setting = 'on' + WHERE name = 'standard_conforming_strings' + SQL end
  9. 70 3 1 26 3 1 Normal Incident App Unused

    Superuser (non-reserved) Superuser (reserved) App Superuser (rejected) Superuser (reserved) 97
  10. su postgres -c 'psql -c "select now()"' Change to the

    Postgres user Run a simple query Connect to the database
  11. # strace su postgres -c 'echo "hello"' getuid() = 0

    # ltrace su postgres -c 'echo "hello"' getlogin() = "chris" getpwnam_r(…) # blocks
  12. # strace su postgres -c 'echo "hello"' getuid() = 0

    # ltrace su postgres -c 'echo "hello"' getlogin() = "chris" getpwnam_r(…) # blocks !?
  13. the user name associated by the login activity with the

    controlling terminal of the current process
  14. the user name associated by the login activity with the

    controlling terminal of the current process
  15. # ps aux | grep '[p]acemakerd' | awk '{print $2}'

    9001 # cat /proc/9001/loginuid 1234
  16. # ps aux | grep '[p]acemakerd' | awk '{print $2}'

    9001 # cat /proc/9001/loginuid 1234 # getent passwd 1234 | cut -d':' -f1 chris
  17. uid = getuid() login = getlogin() passwd = passwd_from_name(login) if

    (passwd.uid == uid) return passwd 0 (root) chris {…} 0 == 1234
  18. uid = getuid() login = getlogin() passwd = passwd_from_name(login) if

    (passwd.uid == uid) return passwd else return passwd_from_uid(uid) 0 (root) chris {…} 0 == 1234
  19. Image credits • Heart emoji - https://github.com/mozilla/fxemoji/blob/ 9f68ca9c5bc51521f9ffe284e00ba8b7308e2c41/svgs/ FirefoxEmoji/u2764-redheart.svg •

    Elephants - https://www.flickr.com/photos/makeitkenya/ 22047623331/ • Detour - https://www.flickr.com/photos/ wwward0/8257660963/ • Kitten - https://www.flickr.com/photos/aigle_dore/7787096102
  20. References (Incident 1) • Safe Operations For High Volume PostgreSQL

    (Braintree) - https://www.braintreepayments.com/blog/safe-operations-for- high-volume-postgresql/ • Zero-downtime Postgres migrations - the hard parts - https:// gocardless.com/blog/zero-downtime-postgres-migrations-the- hard-parts/ • Zero-downtime Postgres migrations - a little help - https:// gocardless.com/blog/zero-downtime-postgres-migrations-a-little- help/
  21. References (Incident 1) • Postgres 9.5 Client Connection Defaults (lock_timeout,

    statement_timeout) - https://www.postgresql.org/docs/9.5/ static/runtime-config-client.html • Postgres 9.6 Client Connection Defaults (idle_in_transaction_session_timeout) - https:// www.postgresql.org/docs/9.5/static/runtime-config-client.html • Postgres 9.5 Error Reporting and Logging (log_lock_waits) - https://www.postgresql.org/docs/9.5/static/runtime-config- logging.html
  22. References (Incident 2) • Postgres 9.5 Resource Consumption (temp_file_limit) -

    https:// www.postgresql.org/docs/current/static/runtime-config- resource.html
  23. References (Incident 3) • PgBouncer feature matrix for pooling modes

    - https:// wiki.postgresql.org/wiki/ PgBouncer#Feature_matrix_for_pooling_modes • The Troubleshooting Tales: issues scaling Postgres connections - https:// gocardless.com/blog/the-troubleshooting-tales-issues-scaling-postgres- connections/ • Rails commit that fixed client_min_messages issue - https://github.com/ rails/rails/commit/a456acb2f2af8365eb9151c7cd2d5a10c189d191 • Rick Branson's tweet about the same problem in Django - https:// twitter.com/rbranson/status/675005104701870080
  24. References (Incident 4) • Postgres 9.5 Connections and Authentication (superuser_reserved_connections)

    - https:// www.postgresql.org/docs/9.5/static/runtime-config- connection.html
  25. References (Incident 5) • Julia Evans' strace zine - http://jvns.ca/blog/2015/04/14/

    strace-zine/ • All fun and games until you start with GameDays - https:// gocardless.com/blog/game-days-at-gc/ • freebsd-hackers thread - https://lists.freebsd.org/pipermail/ freebsd-hackers/2017-March/050757.html
  26. References (Incident 5) • Current code in `su` to look

    up user - https://github.com/ shadow-maint/shadow/blob/ ef45bb2496182b5df90ad0323bef75d1a5d69887/contrib/ pwdauth.c#L127 • Much older similar code in `su` (1994 or earlier) - https:// github.com/freebsd/freebsd/blob/ b5c3fb9427806b740f7df3e43a1513e1f5fa840b/usr.bin/su/ su.c#L255
  27. References (Incident 5) • getuid manual - http://pubs.opengroup.org/onlinepubs/ 009695399/functions/getuid.html •

    getlogin manual - http://pubs.opengroup.org/onlinepubs/ 009695399/functions/getlogin.html • getpwnam manual - http://pubs.opengroup.org/onlinepubs/ 9699919799/functions/getpwnam.html • getpwuid manual - http://pubs.opengroup.org/onlinepubs/ 009695399/functions/getpwuid.html
  28. References (Misc) • Practical Postmortems at Etsy - https://www.infoq.com/articles/ postmortems-etsy

    • Scott Andreas' tweet about investigating things - https:// twitter.com/cscotta/status/535898821914419200