Explain EXPLAIN

Explain EXPLAIN EXPLAINを使ったPostgreSQLのクエリ最適化の基本と実践 PostgreSQL Conference Japan 2024 December 6, 2024
Keiko Oda - pganalyze

Speaker Introduction • 織田敬子 (Keiko Oda) • Product Engineer
at pganalyze ◦ スポンサーしてます！ • 金沢市在住 Kanazawa, Ishikawa 2 2024年に飲んだビールの数々 In SF, Vancouver, NY, Philadelphia

Today’s Goal • EXPLAINとPlannerの基本をしっかり抑える ◦ EXPLAINとは、Plannerとは、使い方、基本的な読み方 • EXPLAINのプランノードについてざっくり言えるようになる ◦ Scan
Nodes, Join Nodes, Other Nodes • クエリを最適化するためのサイクルを押さえる ◦ どのようにベンチマークを測定すべきか • クエリが遅くなるパターンとその対処法を学ぶ ◦ 統計情報がPlannerに与える影響を理解する • Plannerの気持ちがちょっとわかるようになる 3

EXPLAINとPlannerの基本 4

クエリ実行のOverview Postgres内部ではクエリ実行にあたって4つのステップがある 5 Parser Rewriter Planner Executor ①構文解析 ②書き換え
③プランの作成 ④実行 SELECT id, name FROM users WHERE org_id = 123; id | name ----+------- 1 | Alice 3 | Beth 7 | Emily Plan tree Query tree Query Result Tweaked Query tree

EXPLAINとは Plannerが作成したクエリのプラン（実行計画）を表示する 6 Parser Rewriter Planner Executor ①構文解析 ②書き換え ③プランの作成
④実行 SELECT id, name FROM users WHERE org_id = 123; id | name ----+------- 1 | Alice 3 | Beth 7 | Emily Plan tree Query tree Query Result Tweaked Query tree Plan tree 1 Plan tree 2 Plan tree 3

Plannerのおしごと Plannerの仕事：最適なプラン（実行計画）の作成 Choosing the right plan to match the query
structure and the properties of the data is absolutely critical for good performance, so the system includes a complex planner that tries to choose good plans. クエリに合った最適なプランを選ぶことはクエリ最適化には必須で、Plannerは頑張って最適なものを選ぼうとしている。 PostgreSQL Documentation: Using EXPLAIN 📖-1 7

Plannerのおしごと 1. 与えられたクエリに対して取りうるプランを考える ◦ Indexを使うか・使わないか、どの JOINを使うか・どの順番で JOINするか 2. そのプランのコストを計算 3.
最終的に一番小さいコストのプランを選ぶ ◦ プランにかけられる時間・リソースは無限ではない（場合によっては総当たりはしない） 👉 EXPLAINを使うことによって、Plannerがどんなプランを選んだかを知ることができ、また本当に最適なものが選ばれているかを判断できる 👉 クエリのどの箇所でコスト/時間が使われているかを知ることができる 8

EXPLAINの使い方クエリの前にEXPLAINをつける 👉 ツリー構造になったプランノードであるプランツリー（ Plan tree）が出力される EXPLAIN SELECT * FROM
tenk1 LIMIT 3; QUERY PLAN ------------------------------------------------------------------- Limit (cost=0.00..0.13 rows=3 width=244) -> Seq Scan on tenk1 (cost=0.00..445.00 rows=10000 width=244) (2 rows) 9 Data source: tenk1 from Postgres source code 🗄-1 プランノードプランツリー

EXPLAINの使い方 EXPLAINコマンドのオプション（抜粋）コマンドデフォルト ANALYZE 実際にクエリを実行し、かかった時間等を表示する FALSE VERBOSE 詳しい情報（各ノードの出力列名等）を表示する FALSE
BUFFERS バッファの使用状況を表示する FALSE FORMAT アウトプットのフォーマットを指定する TEXT COSTS 全体および各ノードのコストを表示する TRUE TIMING ANALYZE有効時、各ノードでかかった時間を表示する TRUE 10

EXPLAINの使い方オプションを複数つけるときは括弧でくくる -- 基本形（しかし得られるデータはミニマル） EXPLAIN SELECT * FROM tenk1; --
ANALYZEつき EXPLAIN ANALYZE SELECT * FROM tenk1; -- 全部入り EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT * FROM tenk1; -- 可視化・解析ツールを使うときによりよい精度が期待できる（目視には向かない） EXPLAIN (FORMAT JSON) SELECT * FROM tenk1; 11 psql内では \t\a をすると JSONを見やすく出力できる

EXPLAINの読み方 12

EXPLAINの読み方 - COSTS EXPLAIN ANALYZE SELECT * FROM tenk1 t1,
tenk2 t2 WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN ------------------------------------------------------------------- -------------------------------------------------------------- Nested Loop (cost=4.65..118.50 rows=10 width=488) (actual time=0.017..0.051 rows=10 loops=1) -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.38 rows=10 width=244) (actual time=0.009..0.017 rows=10 loops=1) Recheck Cond: (unique1 < 10) Heap Blocks: exact=10 -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.004..0.004 rows=10 loops=1) Index Cond: (unique1 < 10) -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.90 rows=1 width=244) (actual time=0.003..0.003 rows=1 loops=10) Index Cond: (unique2 = t1.unique2) Planning Time: 0.485 ms Execution Time: 0.073 ms 13 PostgreSQL Documentation: Using EXPLAIN 📖-1

EXPLAINの読み方 - COSTS • Plannerのコストパラメータに基づいた尺度で決められる単位 ◦ seq_page_costを1としてrandom_page_costを4とする、など（ランダムアクセスのほうがexpensiveなので大きい値） ◦ 時間やバイト数などではなく、「クエリの実行がどれだけ大変か」の尺度
◦ コストパラメータの変更は可能 • Plannerはクエリに対して様々なプランを考え、コストが最小ものを選ぶ ◦ このコストが的外れだと、Plannerが適切ではないプランを選ぶ可能性がある ◦ 統計情報の重要性 • 上位ノードには子ノードのコストも含まれる ◦ EXPLAINで出てくる一番最初のノード（ルートノード）が全体のコストとなる 14

EXPLAINの読み方 - COSTS • Start-up Cost: 最初の行を取得するまでの推定コスト ◦ シーケンシャルスキャンではすぐに最初の行を取得するため 0に近い
◦ ソート処理ではソートが先に入るため Start-up Costがある程度かかる • Total Cost: すべての行を取得する推定コスト • Rows: 取得される推定行 • Width: 各行の推定平均バイト数 Nested Loop (cost=4.65..118.50 rows=10 width=488) Start-up Cost Total Cost 15

EXPLAINの読み方 - ANALYZE, TIMING EXPLAIN ANALYZE SELECT * FROM tenk1
t1, tenk2 t2 WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN ------------------------------------------------------------------- -------------------------------------------------------------- Nested Loop (cost=4.65..118.50 rows=10 width=488) (actual time=0.017..0.051 rows=10 loops=1) -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.38 rows=10 width=244) (actual time=0.009..0.017 rows=10 loops=1) Recheck Cond: (unique1 < 10) Heap Blocks: exact=10 -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.004..0.004 rows=10 loops=1) Index Cond: (unique1 < 10) -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.90 rows=1 width=244) (actual time=0.003..0.003 rows=1 loops=10) Index Cond: (unique2 = t1.unique2) Planning Time: 0.485 ms Execution Time: 0.073 ms 16 PostgreSQL Documentation: Using EXPLAIN 📖-1

Planning time: 0.485 ms Execution time: 0.073 ms EXPLAINの読み方 -
ANALYZE, TIMING • Planning Time: クエリのプラン策定にかかった時間 • Execution Time: クエリの実行にかかった時間 • 実際のクエリの実行にはこれにプラスして出力のシリアライズ+出力をクライアントに送信する通信コストがかかる 17 Parser Rewriter Planner Planning Time Executor Execution Time ①構文解析 ②書き換え ③プランの作成 ④実行

EXPLAINの読み方 - ANALYZE, TIMING • Actual time: 各ノードの実行にかかった時間（ms） ◦ Costsと同様にstart-upとtotalのかかった時間を表示
◦ TimingがTRUE（デフォルト）のとき表示 • Rows: 実際に取得された行数 • Loops: 何回そのノードが実行されたか ◦ Actual timeとrowsは各実行における値となるので、複数回の場合はこれを掛けることでトータルの実行にかかった時間や取得された行を知ることができる Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.90 rows=1 width=244) (actual time=0.003..0.003 rows=1 loops=10) 実際にかかった時間 0.003 * 10 = 0.03 ms 18

EXPLAINの読み方 - ANALYZE, TIMING Nested Loop (actual time=TIME FOR THIS
AND ALL CHILDREN rows=THE REAL ROW COUNT loops=1) -> Seq Scan on something (actual time=THE TIME IT REALLY TOOK rows=THE REAL ROW COUNT loops=1) -> Index Scan using someidx on somethingelse (actual time=NOT REALLY HOW LONG IT TOOK rows=NOT REALLY HOW MANY ROWS WE GOT loops=HUGE NUMBER) 19 From Postgres mailing list: “explain analyze rows=%.0f” (Robert Haas)

EXPLAINの読み方 - BUFFERS, I/O Timing EXPLAIN (ANALYZE, BUFFERS) SELECT *
FROM procurement_notices WHERE deadline_date < '2024-01-01'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- ---------- Seq Scan on procurement_notices (cost=0.00..30206.38 rows=257720 width=444) (actual time=0.270..76.563 rows=258386 loops=1) Filter: (deadline_date < '2024-01-01'::date) Rows Removed by Filter: 53164 Buffers: shared hit=110 read=26202 I/O Timings: shared read=31.352 Planning Time: 0.155 ms Execution Time: 85.389 ms (7 rows) 20 Data source: Procurement Notice from World Bank Open Data 🗄-2

EXPLAINの読み方 - BUFFERS, I/O Timing • BUFFERSオプションでバッファの使用状況を表示できる ◦ どれだけのデータがバッファ（キャッシュ）からきているのかどうかがわかる •
Buffer types ◦ Shared block: 通常のテーブルやインデックス = Shared Buffers ◦ Local block: 一時テーブルやインデックス ◦ Temp block: ソートやハッシュ、マテリアライズ計画ノードなどの短期データ • Buffer events (単位: block) ◦ Hit: キャッシュがヒットした ◦ Read: キャッシュになかったので OSから読んだ ◦ Dirtied: キャッシュに変更が加えられた ◦ Written: 変更が加えられたキャッシュがディスクに書き出された 21

EXPLAINの読み方 - BUFFERS, I/O Timing • BUFFERSオプションでバッファの使用状況を表示できる ◦ どれだけのデータがバッファ（キャッシュ）からきているのかどうかがわかる •
Buffer types ◦ Shared block: 通常のテーブルやインデックス = Shared Buffers ◦ Local block: 一時テーブルやインデックス ◦ Temp block: ソートやハッシュ、マテリアライズ計画ノードなどの短期データ • Buffer events (単位: block) ◦ Hit: キャッシュがヒットした ◦ Read: キャッシュになかったので OSから読んだ ◦ Dirtied: キャッシュに変更が加えられた ◦ Written: 変更が加えられたキャッシュがディスクに書き出された 22 Understanding Postgres IOPS: Why They Matter Even When Everything Fits in Cache 📖-2 Hit Read

EXPLAINの読み方 - BUFFERS, I/O Timing • テーブルサイズ = 206MB, pg_class.relpages
= 26,312 • block_size = 8kB, shared_buffers = 128MB Seq Scan on procurement_notices Buffers: shared hit=110 read=26202 23 procururement_notices テーブル（読みたい情報）

EXPLAINの読み方 - BUFFERS, I/O Timing 24 pg_buffercacheエクステンション Tracking Postgres Buffer
Cache Statistics over time with pganalyze 📖-8

EXPLAINの読み方 - BUFFERS, I/O Timing • track_io_timingと合わせることでI/O Timinig情報を取得できる ◦ 各buffer
typeごとにread/writeのI/Oにかかった時間の情報が取得可能 ▪ Actual timeでかかった時間と比較することで I/Oにかかった時間がわかる ▪ 31.352ms / 76.563ms ~= 41% ◦ オンにするとpg_stat_statementsにもI/O関連の統計が集められて便利 ◦ 情報取得のオーバーヘッドがシステムによっては許容できない場合もあるので使用の際は要注意 ▪ “Prioritize observability > latency” - by Chelsea Dole Seq Scan on procurement_notices (actual time=0.270..76.563 rows=258386 loops=1) I/O Timings: shared read=31.352 25 Postgres Platform "Best Practices" for the Modern DBA 📹-3

EXPLAINのプランノード 26

プランノード - Scan Nodes Scan Nodes Sequential Scan テーブルのすべてのページ（行）を一つずつ順番にスキャンする Index
Scan インデックスを用いて 1つもしくは複数のマッチする行を見つけ、テーブルから行データを取得する Index-Only Scan インデックスを用いて 1つもしくは複数のマッチする行を見つけ、インデックスから直接データを取得する（テーブルにはアクセスしない） Bitmap Index Scan インデックスを用いてマッチする行の Bitmapを作る複数のBitmap Index ScanをBitmap And/Orを用いて繋げることもある Bitmap Heap Scan Bitmap Index Scanにて得られたマッチする行を実際に取得する Scan Nodes: テーブルデータから行を取得する 27 Monitoring Postgres EXPLAIN plans 📖-3

プランノード - Join Nodes Join Nodes Nested Loop アウターテーブルの各行に対して、インナーテーブルの全行を一つずつ結合していく
󰢐小さいテーブル、インナーテーブルにインデックスが使える場合に有効 󰢄大きなテーブル、他の 2つに比べて非効率 Merge Join 結合キーによってあらかじめソートされた 2つのテーブルを結合する 󰢐大きなテーブル同士の結合に有効、既にソートされているとなお良い 󰢄結合キーにインデックスがないとソートに時間がかかる Hash Join インナーテーブルから結合キーを元にハッシュテーブルを作成し、アウターテーブルに対応する値があるかをスキャンして結合する 󰢐インナーテーブルが小さくアウターテーブルが大きい場合に有効、等価結合 󰢄ハッシュテーブルがwork_mem内に収まらないと非常に遅くなる 28 Join Nodes: 2つの子ノードを結合して行を取得する Internals of physical join operators 📹-4

プランノード - Other Nodes Other Nodes Aggregate Count, sumなどに使用される Append
UNIONを使用して2つのサブプランを繋げるときに使用される Limit 指定された行数のみを取得する　子ノードの完了を待つ必要がないため、子ノードよりコストが低くなることがある Sort 子ノードを元にwork_memを使用してソートを行う work_memに乗り切らない場合は遅くなる Unique ソートされた入力を元に重複を排除する DISTINCT+ORDER BYで使われる 29 その他のノード

プランごとのコスト比較 30

使用するクエリ SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1
< 10 AND t1.unique2 = t2.unique2; 31 tenk1 ——————— unique1 unique2 tenk2 ——————— unique1 unique2 • tenk2テーブルはtenk1のコピー（同じデータ） ◦ 10k rows ◦ unique1: 0 - 9999, random order ◦ unique2: 0 - 9999, ascending ◦ 両テーブルunique1, unique2共にindexあり • 大まかな方針：tenk1からunique1が10未満のものを選んでtenk2と結合する ◦ ① 10未満のものを探すときのスキャン方法 ◦ ② tenk2との結合方法

① デフォルトプラン EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN -------------------------------------------------------------------------------------- Nested Loop (cost=4.65..118.50 rows=10 width=488) -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.38 rows=10 width=244) Recheck Cond: (unique1 < 10) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) Index Cond: (unique1 < 10) -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.90 rows=1 width=244) Index Cond: (unique2 = t1.unique2) (7 rows) 32

WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN -------------------------------------------------------------------------------------- Nested Loop (cost=4.65..118.50 rows=10 width=488) -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.38 rows=10 width=244) Recheck Cond: (unique1 < 10) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) Index Cond: (unique1 < 10) -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.90 rows=1 width=244) Index Cond: (unique2 = t1.unique2) (7 rows) 33 ① tenk1から 10未満のものを Bitmap Index Scanで探す

WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN -------------------------------------------------------------------------------------- Nested Loop (cost=4.65..118.50 rows=10 width=488) -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.38 rows=10 width=244) Recheck Cond: (unique1 < 10) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) Index Cond: (unique1 < 10) -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.90 rows=1 width=244) Index Cond: (unique2 = t1.unique2) (7 rows) 34 ② tenk1とtenk2を Nested Loopで結合する ①で取り出した10行の1行毎にこのIndex Scanが走る

② SET enable_bitmapscan = off SET enable_bitmapscan = off; EXPLAIN
SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN ---------------------------------------------------------------------------------------- Nested Loop (cost=0.57..123.58 rows=10 width=488) -> Index Scan using tenk1_unique1 on tenk1 t1 (cost=0.29..44.46 rows=10 width=244) Index Cond: (unique1 < 10) -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.90 rows=1 width=244) Index Cond: (unique2 = t1.unique2) (5 rows) 35

SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN ---------------------------------------------------------------------------------------- Nested Loop (cost=0.57..123.58 rows=10 width=488) -> Index Scan using tenk1_unique1 on tenk1 t1 (cost=0.29..44.46 rows=10 width=244) Index Cond: (unique1 < 10) -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.90 rows=1 width=244) Index Cond: (unique2 = t1.unique2) (5 rows) 36 ① tenk1から 10未満のものを Index Scanで探す

SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN ---------------------------------------------------------------------------------------- Nested Loop (cost=0.57..123.58 rows=10 width=488) -> Index Scan using tenk1_unique1 on tenk1 t1 (cost=0.29..44.46 rows=10 width=244) Index Cond: (unique1 < 10) -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.90 rows=1 width=244) Index Cond: (unique2 = t1.unique2) (5 rows) 37 ② tenk1とtenk2を Nested Loopで結合する

③ SET enable_bitmapscan, enable_indexscan = off SET enable_bitmapscan = off;
SET enable_indexscan = off; EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN ------------------------------------------------------------------------- Hash Join (cost=470.12..952.73 rows=10 width=488) Hash Cond: (t2.unique2 = t1.unique2) -> Seq Scan on tenk2 t2 (cost=0.00..445.00 rows=10000 width=244) -> Hash (cost=470.00..470.00 rows=10 width=244) -> Seq Scan on tenk1 t1 (cost=0.00..470.00 rows=10 width=244) Filter: (unique1 < 10) (6 rows) 38

SET enable_indexscan = off; EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN ------------------------------------------------------------------------- Hash Join (cost=470.12..952.73 rows=10 width=488) Hash Cond: (t2.unique2 = t1.unique2) -> Seq Scan on tenk2 t2 (cost=0.00..445.00 rows=10000 width=244) -> Hash (cost=470.00..470.00 rows=10 width=244) -> Seq Scan on tenk1 t1 (cost=0.00..470.00 rows=10 width=244) Filter: (unique1 < 10) (6 rows) 39 ① tenk1から 10未満のものを Seq Scanで探す

SET enable_indexscan = off; EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN ------------------------------------------------------------------------- Hash Join (cost=470.12..952.73 rows=10 width=488) Hash Cond: (t2.unique2 = t1.unique2) -> Seq Scan on tenk2 t2 (cost=0.00..445.00 rows=10000 width=244) -> Hash (cost=470.00..470.00 rows=10 width=244) -> Seq Scan on tenk1 t1 (cost=0.00..470.00 rows=10 width=244) Filter: (unique1 < 10) (6 rows) 40 ②-1 ①の結果を元にハッシュテーブルを作る

SET enable_indexscan = off; EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN ------------------------------------------------------------------------- Hash Join (cost=470.12..952.73 rows=10 width=488) Hash Cond: (t2.unique2 = t1.unique2) -> Seq Scan on tenk2 t2 (cost=0.00..445.00 rows=10000 width=244) -> Hash (cost=470.00..470.00 rows=10 width=244) -> Seq Scan on tenk1 t1 (cost=0.00..470.00 rows=10 width=244) Filter: (unique1 < 10) (6 rows) 41 ②-2 ハッシュテーブルとtenk2を Hash Joinで結合する

コストの比較 42 Bitmap Scan Index Scan Seq Scan Hash Join
522.10 (1.635ms) 527.18 (1.738ms) 952.73 (2.598ms) Merge Join 682.98 (2.345ms) 688.06 (2.117ms) 1629.70 (3.298ms) Nested Loop 118.50 (0.123ms) 123.58 (0.095ms) 2415.03 (5.257ms) ① デフォルトプラン ② SET enable_bitmapscan = off ③ SET enable_bitmapscan, enable_indexscan = off 👉 Plannerはしっかり一番コストが低いものを選んでいる

コストの比較 43 Bitmap Scan Index Scan Seq Scan Hash Join
522.10 (1.635ms) 527.18 (1.738ms) 952.73 (2.598ms) Merge Join 682.98 (2.345ms) 688.06 (2.117ms) 1629.70 (3.298ms) Nested Loop 118.50 (0.123ms) 123.58 (0.095ms) 2415.03 (5.257ms) • Seq Scan：Selectivityが高く使用に適したインデックスがあるのに活用できていないので、 Seq Scanの時点でコストが他に比べ非常に高くなる • Merge Join：大きなテーブル同士の結合ではないので、ソートが必要な分コストが余計にかかり非効率。またアウターテーブルのソートが全件ソートになり更にコストが高くなる • Hash Join：等価結合であるためMerge Joinよりコストはよいが、それでもハッシュテーブルを作る手間がかかる。またアウターテーブルの読み込みが Seq Scanでされるためコスト高

クエリの最適化 44

クエリ最適化のサイクル 45 ②改善箇所の仮説設定 ③改善の適用 Slow Query 変更の確定 ①ベンチマーク測定モニタリングツール
pg_stat_statements log_min_duration_statement auto_explain EXPLAIN ANALYZE

auto_explain • 自動的にSlow QueryのEXPLAINをログに出力するエクステンション ◦ log_min_duration_statementでSlow Queryをログに出力できるが、 auto_explainでは EXPLAINの結果も出力してくれる ◦
ほとんどのクラウドプロバイダでサポートしており比較的簡単に導入できる（ただ有効にするとデータベースの再起動が必要になる） • auto_explainの設定オプション ◦ auto_explainのオプションには普通の EXPLAINコマンドと同様のオプションがある ◦ auto_explain.log_min_duration：指定したミリ秒数以上かかったクエリに対して EXPLAINを走らせる ◦ auto_explain.log_analyze：EXPLAIN ANALYZEを走らせる、ON推奨 ◦ auto_explain.log_buffers：EXPLAIN BUFFERSを走らせる、ON推奨 46 Collect Postgres EXPLAIN plans using auto_explain 📖-5

① ベンチマーク測定 • EXPLAIN ANALYZEを使って測定する ◦ できれば BUFFERS, VERBOSE, I/O
Timing (track_io_timing) もオンとしてより詳細なデータを取得 ◦ EXPLAIN (ANALYZE, BUFFERS, VERBOSE) • EXPLAINは複数回測定し、cold cache要因をなくす ◦ 一番目のEXPLAINはキャッシュ状況に左右され、安定したベンチマークとなりにくい ◦ 三回走らせ、三回目（もしくは三回の中で一番早いもの）をベースライン・基準とする ◦ Cold cache: Disk I/O, buffer cache, CPU overhead, etc. • 複数のパラメータで測定する ◦ 同じクエリでもパラメータが違うとクエリプランが違う可能性がある ◦ 改善したクエリはあるパラメータセットには非常に有効だが他のものには不利になることもある 47

② 改善箇所の仮説設定 • 遅い・問題のあるノードを見つける ◦ （中・上級者向け）TEXTフォーマットから自力で見つける ◦ プラン可視化ツールを使う ▪ Explain
Dalibo（ダリボー）https://explain.dalibo.com/ ▪ Explain Depesz（デペシュ）https://explain.depesz.com/ ▪ PgAdmin • なぜ遅いか、どうすれば早くなるかを考察する ◦ 一番難しいところ ◦ クエリプランの視点で改善ができる箇所 ▪ Scan方法が適切でない ▪ JOIN方法が適切でない ▪ JOINの順番が適切でない ◦ よくあるパターンを知っておく +Plannerの気持ちが分かる=改善案が出やすい 48 Optimizing slow queries with EXPLAIN to ﬁx bad query plans 📹-2

③ 改善の適用 • インデックス ◦ 不足しているインデックスの追加、既存のインデックスの見直し • 統計情報 ◦ ANALYZEを走らせて統計情報を最新のものにする
◦ CREATE STATISTICSで補足の統計情報を追加する • クエリの書き直し ◦ クエリを単純化したり不要な列取得を省く ◦ CTEをmaterializedする（WITH x AS MATERIALIZED） • プランの強制 ◦ プランのconﬁg parameterの調整（on/off enable_nestloopなど） ◦ pg_hint_planの使用 • その他（プラン系以外） ◦ リソース関連の設定の調整（ work_mem, shared_buffers） ◦ Partition, Sharding 49 PostgreSQL 12の新機能：CTEの高速化 📖-7

クエリ改善の例 50

クエリ改善の例 ① Slow Scan ② ORDER BY + LIMITの罠 51

統計情報 • Postgresはテーブルごとに統計情報をもつ ◦ 統計情報はautovacuumまたは手動ANALYZEによって収集される ◦ pg_statsビューによって情報を見ることができる ◦ 統計情報は各カラムごと（ CREATE
STATISTICSによって複数カラムの統計収集も可能） 52 A Deep Dive into Postgres Statistics 📹-5 column avg_width カラムの平均幅 n_distinct カラムの一意な値の数（ -1は全行が一意であることを示す） most_common_vals カラムの最も共通する値のリスト histogram_bounds カラムの値を満遍なく似たような数でグループに分配した値のリスト

SelectivityとIndex • Selectivity：選択度 ◦ 特定の検索条件においてテーブルから何行選択されるか ◦ Selectivityが高い→index scan、低い→seqential scan ◦
なんにでもインデックスを貼ればいいというわけではない ▪ 貼っても使われない可能性 +インデックスはコストがかかる 53 Postgres' Clever Query Planning System 📖-6

テストデータ ☕ 54 CREATE TABLE products( id int not null,
name text not null, PRIMARY KEY(id) ); INSERT INTO products (id, name) VALUES (1, 'Latte'), (2, 'Espresso'), (3, 'Cappuccino'), (4, 'Americano'), (5, 'Mocha'); ANALYZE products; ☕ テーブル products: カフェの商品

テストデータ 🧾 55 CREATE TABLE orders( id int not null,
product_id int not null, user_id int not null, created_at date not null, PRIMARY KEY (id) ); INSERT INTO orders (id, product_id, user_id, created_at) SELECT generate_series(1, 5000000) AS id, -- 5M件のオーダー floor(random() * 5 + 1)::int AS product_id, -- usersテーブルはないが、 1-100kのユーザーIDを発行することで擬似的に 100kのユーザーがいるとする floor(random() * 100000 + 1)::int AS user_id, -- 100kのユーザー（大体ユーザーあたり 50オーダー） date '2024-01-01' + (random() * 365)::int AS created_at; ANALYZE orders; 🧾 テーブル orders: カフェのオーダー

① Slow Scan 56 EXPLAIN ANALYZE SELECT count(*) FROM orders
WHERE product_id = 5; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=55117.28..55117.29 rows=1 width=8) (actual time=106.312..106.978 rows=1 loops=1) -> Gather (cost=55117.06..55117.27 rows=2 width=8) (actual time=106.271..106.972 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=54117.06..54117.07 rows=1 width=8) (actual time=99.692..99.692 rows=1 loops=3) -> Parallel Seq Scan on orders (cost=0.00..53069.67 rows=418958 width=0) (actual time=0.138..88.288 rows=333854 loops=3) Filter: (product_id = 5) Rows Removed by Filter: 1332813 Planning Time: 0.487 ms Execution Time: 107.031 ms (10 rows) Mocha（product_id = 5）のトータルオーダー数が知りたいどんなプランが使われるでしょう

WHERE product_id = 5; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=55117.28..55117.29 rows=1 width=8) (actual time=106.312..106.978 rows=1 loops=1) -> Gather (cost=55117.06..55117.27 rows=2 width=8) (actual time=106.271..106.972 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=54117.06..54117.07 rows=1 width=8) (actual time=99.692..99.692 rows=1 loops=3) -> Parallel Seq Scan on orders (cost=0.00..53069.67 rows=418958 width=0) (actual time=0.138..88.288 rows=333854 loops=3) Filter: (product_id = 5) Rows Removed by Filter: 1332813 Planning Time: 0.487 ms Execution Time: 107.031 ms (10 rows) Mocha（product_id = 5）のトータルオーダー数が知りたい

58 Postgres' Clever Query Planning System 📖-7 ① Slow Scan
Parallel Scan

WHERE product_id = 5; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=17186.88..17186.89 rows=1 width=8) (actual time=51.927..52.955 rows=1 loops=1) -> Gather (cost=17186.66..17186.87 rows=2 width=8) (actual time=51.822..52.937 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=16186.66..16186.67 rows=1 width=8) (actual time=46.825..46.826 rows=1 loops=3) -> Parallel Index Only Scan using orders_product_id_idx on orders (cost=0.43..15139.27 rows=418958 width=0) (actual time=0.084..28.528 rows=333854 loops=3) Index Cond: (product_id = 5) Heap Fetches: 0 Planning Time: 0.277 ms Execution Time: 53.019 ms (10 rows) CREATE INDEX ON orders(product_id); コスト約1/3、実行時間約半分

60 Postgres' Clever Query Planning System 📖-7 ① Slow Scan
Bad index scan

WHERE user_id = 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Aggregate (cost=5.45..5.46 rows=1 width=8) (actual time=0.293..0.295 rows=1 loops=1) -> Index Only Scan using orders_user_id_idx on orders (cost=0.43..5.33 rows=51 width=0) (actual time=0.255..0.272 rows=47 loops=1) Index Cond: (user_id = 1) Heap Fetches: 0 Planning Time: 0.827 ms Execution Time: 0.418 ms (6 rows) CREATE INDEX ON orders(user_id); selectivityのよいWHERE句 💡インデックスの効果が最大限に利用される

② ORDER BY + LIMITの罠 63 EXPLAIN ANALYZE SELECT *
FROM orders WHERE user_id = 1 ORDER BY created_at DESC LIMIT 30; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Limit (cost=205.61..205.63 rows=5 width=16) (actual time=0.287..0.289 rows=5 loops=1) -> Sort (cost=205.61..205.74 rows=51 width=16) (actual time=0.284..0.285 rows=5 loops=1) Sort Key: created_at DESC Sort Method: top-N heapsort Memory: 25kB -> Bitmap Heap Scan on orders (cost=4.83..204.77 rows=51 width=16) (actual time=0.100..0.250 rows=47 loops=1) Recheck Cond: (user_id = 1) Heap Blocks: exact=47 -> Bitmap Index Scan on orders_user_id_idx (cost=0.00..4.82 rows=51 width=0) (actual time=0.060..0.060 rows=47 loops=1) Index Cond: (user_id = 1) Planning Time: 0.315 ms Execution Time: 0.338 ms 直近のuser_id=1のオーダー30個が知りたいどんなプランが使われるでしょう

FROM orders WHERE user_id = 1 ORDER BY created_at DESC LIMIT 30; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Limit (cost=204.27..204.34 rows=30 width=16) (actual time=1.116..1.127 rows=30 loops=1) -> Sort (cost=204.27..204.39 rows=51 width=16) (actual time=1.112..1.117 rows=30 loops=1) Sort Key: created_at DESC Sort Method: top-N heapsort Memory: 26kB -> Bitmap Heap Scan on orders (cost=4.83..202.82 rows=51 width=16) (actual time=0.328..1.071 rows=67 loops=1) Recheck Cond: (user_id = 1) Heap Blocks: exact=67 -> Bitmap Index Scan on orders_user_id_idx (cost=0.00..4.82 rows=51 width=0) (actual time=0.283..0.283 rows=67 loops=1) Index Cond: (user_id = 1) Planning Time: 1.180 ms Execution Time: 1.174 ms (11 rows) 直近のuser_id=1のオーダー30個が知りたい

② ORDER BY + LIMITの罠 66 INSERT INTO orders SELECT
generate_series(5000001, 5100000), -- 新しく100kのオーダー floor(random() * 5 + 1)::int AS product_id, floor(random() * 10 + 1)::int AS user_id, -- ユーザーid 1-10のみでオーダー date '2024-01-01' + (random() * 365)::int AS created_at; ちょっとuser_idのselectivityを悪くしてみる SELECT n_distinct, most_common_vals, most_common_freqs FROM pg_stats WHERE attname = 'user_id'; -[ RECORD 1 ]-----+--------- n_distinct | 97702 most_common_vals | {53246} most_common_freqs | {0.0002} -[ RECORD 1 ]-----+--------------------------- n_distinct | 92373 most_common_vals | {6,1,3,2,7,8,4,9,10,5} most_common_freqs | {0.0022333334,0.0021,0.0020333333,0.002,0.002,0 .002,0.0018666667,0.0018,0.0017666667,0.0017}

FROM orders WHERE user_id = 1 ORDER BY created_at DESC LIMIT 30; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Limit (cost=0.43..605.41 rows=30 width=16) (actual time=0.058..11.740 rows=30 loops=1) -> Index Scan Backward using orders_created_at_idx on orders (cost=0.43..215976.74 rows=10710 width=16) (actual time=0.056..11.734 rows=30 loops=1) Filter: (user_id = 1) Rows Removed by Filter: 7280 Planning Time: 0.210 ms Execution Time: 11.779 ms (6 rows) 直近のuser_id=1のオーダー30個が知りたいどんなプランが使われるでしょう

FROM orders WHERE user_id = 1 ORDER BY created_at DESC LIMIT 30; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Limit (cost=0.43..605.41 rows=30 width=16) (actual time=0.058..11.740 rows=30 loops=1) -> Index Scan Backward using orders_created_at_idx on orders (cost=0.43..215976.74 rows=10710 width=16) (actual time=0.056..11.734 rows=30 loops=1) Filter: (user_id = 1) Rows Removed by Filter: 7280 Planning Time: 0.210 ms Execution Time: 11.779 ms (6 rows) 直近のuser_id=1のオーダー30個が知りたい

FROM orders WHERE user_id = 1 ORDER BY created_at DESC LIMIT 1000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Limit (cost=1000.46..16789.51 rows=1000 width=16) (actual time=2038.116..2046.743 rows=1000 loops=1) -> Gather Merge (cost=1000.46..180837.81 rows=11390 width=16) (actual time=2038.115..2046.706 rows=1000 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Index Scan Backward using orders_created_at_idx on orders (cost=0.43..178523.10 rows=4746 width=16) (actual time=148.825..1387.316 rows=518 loops=3) Filter: (user_id = 1) Rows Removed by Filter: 1632248 Planning Time: 0.236 ms Execution Time: 2046.784 ms (9 rows) 直近のuser_id=1のオーダー1000個が知りたい（追加されたオーダーが全て年初） Postgres Planner Quirks: The impact of ORDER BY + LIMIT on index usage 📖-9

FROM orders WHERE user_id = 1 ORDER BY created_at+0 DESC LIMIT 1000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Limit (cost=22119.99..22122.49 rows=1000 width=20) (actual time=10.200..10.389 rows=1000 loops=1) -> Sort (cost=22119.99..22148.47 rows=11390 width=20) (actual time=10.197..10.283 rows=1000 loops=1) Sort Key: ((created_at + 0)) DESC Sort Method: top-N heapsort Memory: 161kB -> Bitmap Heap Scan on orders (cost=136.70..21495.49 rows=11390 width=20) (actual time=1.920..7.302 rows=10280 loops=1) Recheck Cond: (user_id = 1) Heap Blocks: exact=596 -> Bitmap Index Scan on orders_user_id_idx (cost=0.00..133.86 rows=11390 width=0) (actual time=1.814..1.814 rows=10280 loops=1) Index Cond: (user_id = 1) Planning Time: 0.826 ms Execution Time: 10.585 ms 直近のuser_id=1のオーダー1000個が知りたい（追加されたオーダーが全て年初）

クエリプランのトラッキング 71 • クエリがどのようなプランをどれだけ実行しているか ◦ (auto_explain) ◦ aurora_stat_plans ◦ pg_store_plans

まとめ • EXPLAINはslow queryをデバッグする強い味方 ◦ プランを見ることでどこが遅かったかが分かる ◦ プランが最適かを見極める • Plannerの気持ちによりそって、よいインデックス・統計情報を与えてあげ
て、Plannerが最適なプランを選ぶよう助けてあげる 72

Thank You! https://speakerdeck.com/keiko713/explain-explain @keiko713 73

📖 References (Blogs, Documents) 📖 1. Postgres document: Using EXPLAIN
◦ これさえ読めば基本はカバーできる。 ◦ [EN] https://www.postgresql.org/docs/current/using-explain.html ◦ [JA] https://www.postgresql.jp/document/16/html/using-explain.html 2. Understanding Postgres IOPS: Why They Matter Even When Everything Fits in Cache – John Dalton (Crunchy Data) ◦ Postgresがディスクによくアクセスするという特性から、 IOPSはPostgresのパフォーマンスを語る上で避けられないトピック。 PostgresのIO周りについてしっかり学ぶことができる。 ◦ [EN] https://www.crunchydata.com/blog/understanding-postgres-iops 3. Monitoring Postgres EXPLAIN plans – pganalyze docs ◦ 各ノードについての説明、また EXPLAINのプランから読み取れる情報について詳しく解説。 ◦ [EN] https://pganalyze.com/docs/explain 74

📖 References (Blogs, Documents) 📖 4. チューニング～ SQLチューニングを実施する～
– Fujitsu PostgreSQLインサイド ◦ スキャン方法と結合方法について日本語で図付きでわかりやすく説明されている。 ◦ [JA] https://www.fujitsu.com/jp/products/software/resources/feature-stories/postgres/article-i ndex/implement-sqltuning/ 5. Collect Postgres EXPLAIN plans using auto_explain – pganalyze docs ◦ 各プロバイダごとのauto_explainの導入方法が記載されており、またおすすめの設定も書いてある。 ◦ [EN] https://pganalyze.com/docs/explain/setup/auto_explain 6. Postgres' Clever Query Planning System – Paul Ramsey (Crunchy Data) ◦ インデックス、selectivity、統計情報の関係を直感的にわかりやすい図をたくさん交えて解説している。英語を読まなくても図を見るだけでもためになる。 ◦ [EN] https://www.crunchydata.com/blog/indexes-selectivity-and-statistics 75

📖 References (Blogs, Documents) 📖 7. PostgreSQL 12の新機能：CTEの高速化 ◦ PostgreSQL
12からCTEがデフォルトでマテリアライズ化されていないことについて例も含めて詳しく解説してある。 ◦ [JA] https://www.sraoss.co.jp/tech-blog/pgsql/pg12-cte/ 8. Tracking Postgres Buffer Cache Statistics over time with pganalyze ◦ pg_buffercacheエクステンションを使ってバッファの使用状況の推移が見れる機能を紹介している。 ◦ [EN] https://pganalyze.com/blog/tracking-postgres-buffer-cache-statistics 9. Postgres Planner Quirks: The impact of ORDER BY + LIMIT on index usage ◦ ORDER BY + LIMITの罠について実例も交えて詳しく解説してある。 ◦ [EN] https://pganalyze.com/blog/5mins-postgres-planner-order-by-limit 76

🗄 References (Data Sources) 🗄 1. tenk1 and tenk2 tables
– Postgres source code ◦ tenkっていうのはちなみにテンケー、つまり 10kです（いつもテンキって読んでた）。 ◦ https://github.com/postgres/postgres/blob/master/src/test/regress/sql/test_setup.sql 2. Procurement Notice – World Bank Open Data ◦ ちょっと大きめのデータが手軽に欲しいときになかなか使える。データは CSVで簡単にダウンロードでき、またcolumn detailsがしっかりしているのでそれを元に table deﬁnitionを作りやすい。 ◦ https://ﬁnancesone.worldbank.org/procurement-notice/DS00979 77

📹 References (Videos) 📹 1. Explaining the Postgres Query Optimizer
| Citus Con: An Event for Postgres 2022 - Bruce Momjian ◦ Postgresの統計情報がquery planningにどのように影響を与えるかが実例を交えて解説されている非常に腹落ちしやすい。 ◦ [EN] https://www.youtube.com/watch?v=wLpcVM9qxV0 2. Webinar recording: Optimizing slow queries with EXPLAIN to ﬁx bad query plans - Lukas Fittl (pganalyze) ◦ このスライドを作るにあたっても参考にした webinar。EXPLAINのoverview、クエリ最適化についてカバーされており、また例もたくさん提示されている。スライドにある 70を超えるQ&Aも非常に有益な情報が多い。 ◦ [EN] https://www.youtube.com/watch?v=NE-cf1h301I ◦ [EN, slides] https://resources.pganalyze.com/pganalyze_Query_Optimization_EXPLAIN_Webinar_Jun e_2024.pdf 78

📹 References (Videos) 📹 3. Postgres Platform "Best Practices" for
the Modern DBA (PGConf.EU 2024) - Chelsea Dole ◦ EXPLAINについてはauto_explainくらいしか話されていなく、トークも「 Postgresを誰かのためにマネージするベストプラクティス」がメインだが、 Postgresを管理する上での経験に基づいたベストプラクティスがつまっていてとてもよいトーク。 ◦ [EN] https://www.youtube.com/watch?v=R8RBkx-ysqg 4. Internals of physical join operators (SQL with BRET) - Bret Wagner ◦ SQL Serverではどんな風にJOINが行われているのかを説明したビデオ群。 Postgresとは細かいところで違うところもあるかもしれないが、大筋は同じ。彼の英語はわかりやすく、テンポもすごくよく、またアニメーションですっきり 3つのJOINが学べる。 ◦ [EN] Nested Loop Join: https://www.youtube.com/watch?v=0arjvMJihJo ◦ [EN] Merge Join: https://www.youtube.com/watch?v=IFUB8iw46RI ◦ [EN] Hash Match Join: https://www.youtube.com/watch?v=59C8c7p_hII 79

📹 References (Videos) 📹 5. A Deep Dive into Postgres
Statistics (PGConf.EU 2024) - Louise Grandjonc ◦ Postgresの統計情報について深堀りしたトーク。たくさんの具体例と mathがあり腹落ちしやすい内容。 ◦ [EN] https://www.youtube.com/watch?v=ApAClPFJ_rU 80

おまけスライド 81

JOINとは • 複数のテーブルを組み合わせて、必要なデータを一度に取り出す • 共通のカラムを基準にテーブル同士を結びつけ、データを検索・取得する 82 id name org_id 1
Alice 123 2 Amber 125 3 Beth 123 id name 123 Apple 124 Banana 125 Orange userid username orgname 1 Alice Apple 2 Amber Orange 3 Beth Apple users orgs SELECT … FROM users u JOIN orgs o ON u.org_id = o.id;

プランノード - Join Nodes Join Nodes Nested Loop SELECT *
FROM small_table JOIN large_table ON small_table.id = large_table.some_id; (index on large_table.id) Merge Join SELECT * FROM large_table1 JOIN large_table2 ON large_table1.created_at > large_table2.last_seen; (tables are sorted by join keys or have indexes on join keys) Hash Join SELECT * FROM smaller_table JOIN larger_table ON smaller_table.id = larger_table.some_id; (equi-join) 83 Join Nodes: 2つの子ノードを結合して行を取得する Internals of physical join operators 📹-4

EXPLAINの読み方 - BUFFERS, I/O Timing EXPLAIN (ANALYZE, BUFFERS) SELECT *
FROM tenk1 UNION ALL SELECT * FROM tenk1; QUERY PLAN ----------------------------------------------------------------------------------------- -------------------------------- Append (cost=0.00..1000.50 rows=20700 width=244) (actual time=1.620..11.817 rows=20000 loops=1) Buffers: shared hit=355 read=335 -> Seq Scan on tenk1 (cost=0.00..448.50 rows=10350 width=244) (actual time=1.619..7.446 rows=10000 loops=1) Buffers: shared hit=10 read=335 -> Seq Scan on tenk1 tenk1_1 (cost=0.00..448.50 rows=10350 width=244) (actual time=0.013..1.726 rows=10000 loops=1) Buffers: shared hit=345 Planning Time: 0.413 ms Execution Time: 13.421 ms (8 rows) 84

Explain EXPLAIN

Explain EXPLAIN

More Decks by Keiko Oda

Other Decks in Technology

Featured

Transcript