Percona (on a shared webspace) ! We added UTF-8 / timezone-support on the fly ™ ! 9 years ago we found out: adding new indices or columns on production was a massive pain ! We also found out: Adding multiple indices in parallel might blow up your database engine
! We migrated to Postgres 9.6 and had a decent Master-/Slave Setup on physical hardware ! We upgraded RAM + SSDs, but … ! Couple of years later, we found out that: There’s a physical limit of how much RAM and SSD discs you can put into a machine (2 x 2xCPU 2.3GHz 10-core 128GB RAM at start, in the end: 384GB RAM)
place left on the SSDs (2.5 TB) ! We were stuck on Postgres 9.6 because ! No space for duplicating the db on the server with a newer Postgres version ! No idea of how long a in-place upgrade to a new version would mean (downtime > 6h+: impossible) ! No secondary system to duplicate Swat.io to perform upgrade tests
then (only S3) ! While talking about life in general, the sales manager asked regarding current challenges ! When I mentioned “database is stuck on Postgres 9.6 and we need a new server” he was alerted and saw there’s a huge opportunity ! He dropped the famous sentence “Johannes, we have a tool called Database Migration service - with that you can migrate to a newer Postgres database version on AWS on the fly - without downtime” ! I was laughing out loud and said: “Sure thing”
with full MySQL and PostgreSQL compatibility https://aws.amazon.com/rds/aurora/ at 1/10th the cost of commercial databases. Aurora has 5x the throughput of MySQL and 3x of PostgreSQL. https://aws.amazon.com/rds/aurora/ built-in security, continuous backups, serverless compute, up to 15 read replicas, automated multi-Region replication
to finally fully migrate to AWS → We tried multiple attempts and had some fuck ups in the process → In the end, we really upgraded to Postgres 13 without any problems (on that day) → We were happily ever after (almost)
replication slot 3. Start a dump (in total: runtime 19h) 4. Start Swat.io 5. When dump is done: Upload to S3 (1h) 6. Import Dump on AWS machine (16h) 7. AWS DMS Change Data Capture start (Replication lag of 36h) 8. When 0 latency: VACCUUM ANALYZE 9. Manually Fix Sequences (auto increments not properly set with sync mode we have chosen) 10. Turn off old app / servers 11. Turn on AWS app / servers 12. Cross your fingers
AWS DMS bastion host sizing and AWS DMS configuration parameters ! We had some tries where replication lag did not went down ! We did not know about the sequences in the beginning ! We had some crashes of the old app because AWS DMS was overloading the system (DoS attack basically)
machines in the beginning and downgraded then within minutes to a more cost-sensitive instance type) ! (also while migrating! " ) ! Snapshots all the way ! No need buying new SSDs ! Spin up a new cluster anytime for tests / upgrades ! In the meantime, we already upgraded to Aurora Postgres 15 without any problems (16 pending) ! It's really just Postgres, we did not change a single query
blindly trust them ! However, sometimes you need to aim for a moonshot ! Be careful with Aurora Pricing! ! "IO Optimized" are helpful for saving money ! Boy, this thing is expensive! ! Way higher costs " ! Aurora RDS Proxy always lags in Postgres Version support ! You should turn off AWS instances when they are not needed