Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Playing Nice with Your Friends: Database Divers...

Playing Nice with Your Friends: Database Diversity with Postgres FDWs

You want to use Postgres. Your data lives outside of Postgres, and you're not ready to migrate it. Or, you want to leverage performance features of other databases like Redis, so you can't migrate it to Postgres just yet. In either case, you can still use Postgres! Foreign Data Wrappers (FDWs) give you the ability to query external data from within Postgres, just as if it were a regular Postgres table. This talk will discuss all about FDWs and how you can leverage this feature for performance and stability.

As presented at COSCUP Taiwan 2024-08-03

Richard Yen

August 03, 2024
Tweet

More Decks by Richard Yen

Other Decks in Technology

Transcript

  1. ©EDB 2024 — ALL RIGHTS RESERVED. Playing Nice with Your

    Friends Database Diversity with Postgres FDWs Richard Yen COSCUP 2024 2024-08-03
  2. © EDB 2024 - All Rights Reserved 2 • Software

    Developer/Support Engineer @ EDB since 2015 • Previously a DBA and Web Developer • Been using PostgreSQL since v. 7.4 About Me
  3. © EDB 2024 - All Rights Reserved 3 • Not

    ready to go 100% PostgreSQL • 100% PostgreSQL is not working out • Want to leverage features in other engines • Organizational structure does not make 100% PostgreSQL feasible • Don't want to put all your eggs into one basket • Want to leverage more hardware Is this you?
  4. © EDB 2024 - All Rights Reserved 4 • What

    are Foreign Data Wrappers? • Varieties and use cases for FDWs • Setting up FDWs • Demo • Things to consider when implementing FDWs into your infrastructure • How FDWs help achieve diversity • Creating your own FDW • Demo Our Roadmap
  5. © EDB 2024 - All Rights Reserved 5 • Extensions

    that allow access to data outside of a PostgreSQL database cluster • Could be another PostgreSQL database • Another RDBMS/NoSQL database • Flat files, REST APIs, etc. • Data from the external data source is represented as a table (called a foreign table) • Foreign tables can be used in SQL queries just like ordinary tables • In many cases, foreign tables can receive DML What are Foreign Data Wrappers?
  6. © EDB 2024 - All Rights Reserved 6 • Oracle

    DBLink • PG 7.2 implemented its own version of DBLink • Access to data was not very intuitive (used dblink-specific functions) A bit of history postgres=# create view foreign_view as select * from dblink('host=pg1 user=postgres dbname=postgres', 'select aid,bid,abalance, fi ller from pgbench_accounts') as t1 (aid int,bid int,abalance int, fi ller text); CREATE VIEW postgres=# select * from foreign_view where aid = 1; aid | bid | abalance | fi ller -----+-----+----------+-------------------------------------------------------------------------------------- 1 | 1 | 3 | (1 row) postgres=# select * from dblink('host=pg1 user=postgres dbname=postgres', 'select aid,bid,abalance, fi ller from pgbench_accounts') as t1 (aid int,bid int,abalance int, fi ller text) where aid = 1; aid | bid | abalance | fi ller -----+-----+----------+-------------------------------------------------------------------------------------- 1 | 1 | 3 | (1 row)
  7. © EDB 2024 - All Rights Reserved 7 • Oracle

    DBLink • PG 7.2 implemented its own version of DBLink • Access to data was not very intuitive (used dblink-specific functions) • SQL/MED (SQL Management of External Data) standard introduced in 2003 • PG 9.1 introduced file_fdw • PG 9.3 introduced postgres_fdw • Very few database engines implement SQL/MED (and Oracle is not one of them!) • Most other database engines don't support as many external sources as PG A bit of history
  8. © EDB 2024 - All Rights Reserved 8 FDW is

    Smarter and Faster postgres=# explain (analyze, verbose) select * from dblink('host=pg1 user=postgres dbname=postgres', 'select aid,bid,abalance, fi ller from pgbench_accounts') as t1 (aid int,bid int,abalance int, fi ller text) where aid = 1; QUERY PLAN ---------------------------------------------------------------------------------------------- Function Scan on public.dblink t1 (cost=0.00..12.50 rows=5 width=44) (actual time=276.996..315.074 rows=1 loops=1) Output: aid, bid, abalance, fi ller Function Call: dblink('host=pg1 user=postgres dbname=postgres'::text, 'select aid,bid,abalance, fi ller from pgbench_accounts'::text) Filter: (t1.aid = 1) Rows Removed by Filter: 99999 (...) Time: 319.799 ms postgres=# explain (analyze, verbose) select * from foreign_table where aid = 1; QUERY PLAN ----------------------------------------------------------------------------------------------------- Foreign Scan on public.foreign_table (cost=100.00..4629.02 rows=1 width=97) (actual time=1.220..1.222 rows=1 loops=1) Output: aid, bid, abalance, fi ller Remote SQL: SELECT aid, bid, abalance, fi ller FROM public.pgbench_accounts WHERE ((aid = 1)) (...) Time: 4.259 ms
  9. © EDB 2024 - All Rights Reserved 10 • Install

    the extension • Create a foreign server • Create a user mapping (if necessary) • Create foreign tables (or import foreign schema) Getting Started with FDW postgres=# CREATE EXTENSION postgres_fdw; CREATE EXTENSION postgres=# CREATE SERVER foreign_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'pg1', port '5432', dbname 'postgres'); CREATE SERVER postgres=# CREATE USER MAPPING FOR postgres SERVER foreign_server OPTIONS (user 'postgres', password 'password'); CREATE USER MAPPING postgres=# CREATE FOREIGN TABLE foreign_table (aid integer NOT NULL, bid integer, abalance integer, fi ller text) SERVER foreign_server OPTIONS (schema_name 'public', table_name 'pgbench_accounts'); CREATE FOREIGN TABLE -- Another option for creating foreign tables postgres=# IMPORT FOREIGN SCHEMA public FROM SERVER foreign_server INTO public; IMPORT FOREIGN SCHEMA
  10. © EDB 2024 - All Rights Reserved 11 • Useful

    commands: • \des -- describe external (foreign) servers • \deu -- describe eternal user mappings • \det -- describe external tables Getting Started with FDW
  11. © EDB 2024 - All Rights Reserved 13 Tips •

    Be mindful of what gets pushed down to the remote server postgres=# explain (verbose, analyze) select * from foreign_table ft join pgbench_accounts a on a.aid=ft.aid where a.abalance = 10 or ft.abalance = 9; QUERY PLAN ------------------------------------------------------------------------------------------------------------ -------------- Hash Join (cost=7192.00..18422.50 rows=14243 width=194) (actual time=109.947..963.111 rows=14502 loops=1) (...) -> Foreign Scan on public.foreign_table ft (cost=100.00..6379.00 rows=100000 width=97) (actual time=3.742..612.295 rows=100000 loops=1) Output: ft.aid, ft.bid, ft.abalance, ft. fi ller Remote SQL: SELECT aid, bid, abalance, fi ller FROM public.pgbench_accounts -> Hash (cost=4279.00..4279.00 rows=100000 width=97) (actual time=105.742..105.744 rows=100000 loops=1) (...) Planning Time: 0.402 ms Execution Time: 966.560 ms (16 rows)
  12. © EDB 2024 - All Rights Reserved 14 Tips •

    Be mindful of what gets pushed down to the remote server postgres=# explain (verbose, analyze) select * from foreign_table ft join pgbench_accounts a on a.aid=ft.aid where ft.abalance = 10 or ft.abalance = 9; QUERY PLAN ------------------------------------------------------------------------------------------------------ ------------- Hash Join (cost=5352.20..10151.80 rows=14560 width=194) (actual time=130.925..193.293 rows=15156 loops=1) (...) -> Hash (cost=5170.20..5170.20 rows=14560 width=97) (actual time=128.999..129.003 rows=15156 loops=1) Output: ft.aid, ft.bid, ft.abalance, ft. fi ller Buckets: 16384 Batches: 1 Memory Usage: 2038kB -> Foreign Scan on public.foreign_table ft (cost=100.00..5170.20 rows=14560 width=97) (actual time=3.912..123.474 rows=15156 loops=1) Output: ft.aid, ft.bid, ft.abalance, ft. fi ller Remote SQL: SELECT aid, bid, abalance, fi ller FROM public.pgbench_accounts WHERE (((abalance = 10) OR (abalance = 9))) Planning Time: 0.437 ms Execution Time: 196.240 ms (13 rows)
  13. © EDB 2024 - All Rights Reserved 15 Tips •

    Network will always be your biggest bottleneck, so minimize the amount of data sent from the remote server • Performance can depend on foreign server's hardware • DDL changes are not automatically propagated • Upgrades need more care and attention, in case of any incompatibilities • User mappings require thoughtfulness regarding security • Backups of database do not include data in foreign tables TL;DR - remember that you're no longer dealing with just one machine, but two or more
  14. © EDB 2024 - All Rights Reserved 16 Data Diversity

    • Because FDWs allow you to access external data for 50+ sources, you can leverage PostgreSQL to connect two other sources • Good for ETL Pipelines • Good for varied infrastructure • Prevents having to migrate data from one place to another • Demo: MySQL + SQLite
  15. © EDB 2024 - All Rights Reserved 17 Building your

    own FDW • Native • Requires C programming knowledge • See the PostgreSQL docs • Multicorn (https://multicorn.org/) • Wrapper interface for Python apps • Not actively developed, but still works • FDW APIs exist for other languages (i.e., Holycorn for Ruby")