Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

画像検索今昔物語

yu4u
April 16, 2018

 画像検索今昔物語

古典的な特徴点ベースの手法から、深層学習を用いた手法まで、30分で画像検索の全てが分かるを目指して作成した(やっつけ)資料です。なお発表時間は足りなかった模様。

yu4u

April 16, 2018
Tweet

More Decks by yu4u

Other Decks in Technology

Transcript

  1. 特定物体認識 1 • 類似画像検索 • ⼀般物体認識 (クラス分類) • 特定物体認識 同じ物体(インスタンス)が写っている画像を検出

    Result Query Query 空、雲 Result Query 大規模特定物体認識の最新動向 https://sites.google.com/site/yu4uchida/uchida_ieice2013.pdf
  2. ⼤域特徴ベース vs 局所特徴ベース 2 • ⼤域特徴 (global feature) ベース –

    画像から1つの特徴を抽出(e.g. カラーヒストグラム) – 類似画像検索ではうまくいくが 特定物体認識ではうまくいかない • 局所特徴 (local feature) ベース – 画像から多数の局所特徴を抽出(e.g. SIFT) – それらのマッチング結果により類似度を定義 – SIFT等の強⼒な特徴量により deep learningに最後まで抵抗(最近やられた模様)
  3. 局所特徴ベース特定物体認識 4 • Detection︓局所特徴領域の検出 • Description︓局所特徴領域の記述 • Indexing&Search︓(近似)最近傍探索 • Post

    process – Geometric verification – Query expansion セットになることが多いが 本来は独⽴して選択できる
  4. 局所特徴を⽤いた特定物体認識 5/25/23 5 ①Extract local regions (patches) from images ②Describe

    the patches by d-dimensional vectors ③Make correspondences between similar patches ④Calculate similarity between the images Similarity: 3 Position (x, y) Orientation θ Scale σ Feature vector f (e.g., 128-dim SIFT) Local feature
  5. 局所特徴領域の検出⼿法 9 Hessian Beaudet’78 Harris Harris’88 LoG Lindeberg’98 DoG Lowe’99

    SURF Bay’06 Harris-Laplace Mikolajczyk’01 Hessian-Affine Mikolajczyk’04 Harris-Affine Mikolajczyk’02 FAST Rosten’05 Affine-invariant Scale-invariant Rotation-invariant LoG scale seletion Affine adaptation Multi-scale + Box filter acceleration LoG approximation Hessian-Laplace Mikolajczyk’01 Oriented FAST Rublee’11 SUSAN Smith’97 Simplification + tree acceleration Orientation Corner-like Blob-like (SIFT) (ORB)
  6. 局所特徴領域の記述⼿法 10 • 実数値タイプとバイナリタイプがある SIFT Lowe’99 SURF Bay’06 BRIEF Calonder’10

    ORB Rublee’11 GLOH Mikolajczyk’05 FREAK Alahi’12 A-KAZE Alcantarilla’13 LDB Yang’12 LATCH Levi’16 BRISK Leutenegger’11 Real-valued Binary (0.56, 0.22, -0.10, …, 0.96) (1, 0, 0, …, 1) RootSIFT Arandjelovic’12
  7. どれを使えば良いの︖ 11 • 精度重視 – SIFT or Hessian Affine detector

    + RootSIFT descriptor • 速度重視 – ORB detector + ORB descriptor • Local Feature Detectors, Descriptors, and Image Representations: A Survey https://arxiv.org/abs/1607.08368
  8. RootSIFT [Arandjelovic+, CVPRʼ12] 5/25/23 12 • Hellinger kernel works better

    than Euclidean distance in comparing histograms such as SIFT • Hellinger kernel (Bhattacharyyaʼs coefficient) for L1 normalized histograms x and y: • Explicit feature map of x into xʼ : – L1 normalize x – element-wise square root x to give xʼ – then xʼ is L2 normalized • Computing Euclidean distance in the feature map space is equivalent to Hellinger distance in the original space: RootSIFT RootSIFT
  9. Large-scale Object Recognition 5/25/23 13 ・ ・ ・ Distance calculation

    Query image Reference images Explicit feature matching requires high computational cost and memory footprint Match Bag-of-visual words!
  10. Bag-of-Visual Words [Sivic+, ICCVʼ03] 5/25/23 14 • Offline – Collect

    a large number of training vectors – Perform clustering algorithm (e.g., k-means) – Centroids of clusters = visual words (VWs) • Online: – All features are assigned to their nearest visual words – An image is represented by the frequency histogram of VWs – (Dis)similarity is defined by the distance between histograms Visual words (VW) VW1 VWn VW2 … Visual words - - " " " - - - " " " - - - " " " - - - " " " - - - " " " - Frequency } 1 | { N i i £ £ = v V
  11. Bag-of-Visual Words [Sivic+, ICCVʼ03] 5/25/23 15 15 VW1 VW2 VWk

    VWn ・ ・ ・ ・ ・ ・ Indexing step (quantization) Search step (quantization) Match Match Matching can be performed in O(1) with an inverted index Query image Reference images Nearest VW
  12. 1 2 w N Inverted index Image ID 1 2

    3 4 5 6 7 8 9 10 11 12 ... Image ID Accumulated scores VW ID Obtain image IDs Query image Reference image Image ID ... (x, y) σ θ (1) Feature detection (2) Feature description (3) Quantization (1) Feature detection (2) Feature description (3) Quantization (4) Voting ... ... ... ... Visual word v1 ... Visual word vw ... Visual word vN Visual words 1 4 5 7 10 16 19 Offline step Visual word v1 ... Visual word vw ... Visual word vN Visual words Get images with the top-K scores Results inlier outlier (5) Geometric verification 全体処理 Geometric verification
  13. モデル; pʼ = Mp 18 rotation scaling translation similarity trans.

    affine trans. perspective trans. 1DoF 2DoF 1DoF 4DoF 5DoF 6DoF 7DoF Fundamental Matrix
  14. Average Query Expansion [Chum+, ICCVʼ07] 5/25/23 23 • Obtain top

    (m < 50) verified results of original query • Construct new query using average of these results Without geometric verification, QE degrades accuracy! Query image Verified results New query
  15. Multiple Image Resolution Expansion [Chum+, ICCVʼ07] 5/25/23 24 ROI Query

    image ROI ROI ROI ROI ROI ROI First verified results ROI ROI ROI ROI ROI ROI • Calculate relative change in resolution • Construct average query for each resolution New query1 New query2 New query3
  16. Query Expansion Results 5/25/23 25 • ori = original query

    • qeb = query expansion baseline • trc = transitive closure expansion • avg = average query expansion • rec = recursive average query expansion • sca = multiple image resolution expansion
  17. Discriminative Query Expansion [Arandjelovic+, CVPRʼ12] 5/25/23 26 • Train a

    linear SVM classifier – Use verified results as positive training data – Use low ranked images as negative training data – Rank images on their signed distance from the decision boundary – Reranking can be efficient with an inverted index!
  18. 最近傍探索 (Nearest Neighbor Search, NNS) 28 • 距離空間 M における点の集合

    S とクエリ点 q∈M が 与えられた際に S の中で q に最も近い点を探す – k近傍 / range search • ユークリッド空間での最近傍探索を扱うことがほとんど • kd-tree, SR-tree等のindexingにより⾼速化 (⾼次元(数⼗︖)で次元の呪いにかかる) + + + + + + + + + + + + o q Input + + + + + + + + + + + + o q Output S
  19. 近似最近傍探索 29 • エラーを許す代わりに⾼速化、エラー率とトレードオフ – 速度、精度、メモリ使⽤量がトレードオフになる • ⽊構造+priority search –

    kd-tree, randomized kd-trees, hierarchical kd-tree – メモリを気にしなければ無難で良い • Locality Sensitive Hashing (LSH) 系 – ***LSHがいっぱい。個⼈的には嫌い • 直積量⼦化系 – サーベイ → https://www.jstage.jst.go.jp/article/mta/6/1/6_2/_article/-char/ja/ – データを圧縮し、圧縮したまま検索 • バイナリ圧縮系 – いっぱいある https://www.slideshare.net/ren4yu/k-means-hashing-up (Heさんだよ) – バイナリ符号にするのでpopcnt命令で距離計算できる (がそのままだとlinear search)
  20. CNN系 (global feature) 31 • CNN Features off-the-shelf: an Astounding

    Baseline for Recognition https://arxiv.org/abs/1403.6382 – クラス分類⽤のCNN (OverFeat) のFCをそのまま使っても結構良い • Neural Codes for Image Retrieval https://arxiv.org/pdf/1404.1777.pdf – 最終層前のFCを使ったほうが良いとか、検索対象のドメインで finetuneしたほうが良いとか • CNN Image Retrieval Learns from BoW: Unsupervised Fine- Tuning with Hard Examples https://arxiv.org/abs/1604.02426 – Siamese Networkで学習 • Global featureでもかなり良い(vs. FV/VLAD) • 基本的に回転・スケール不変ではないことに注意
  21. CNN系 (local feature) 32 • LIFT: Learned Invariant Feature Transform

    https://arxiv.org/abs/1603.09114 – 検出、⾓度推定、記述をend-to-endで学習 – 遅いし検索では精度出ていない • Large-Scale Image Retrieval with Attentive Deep Local Features https://arxiv.org/abs/1612.06321 – FCN+アテンション(マルチスケールでやる)で局所特徴を定義 – 良さげ https://github.com/tensorflow/models/tree/master/researc h/delf – 回転不変性は担保されない
  22. Comparative Study 34 • Revisiting Oxford and Paris: Large-Scale Image

    Retrieval Benchmarking https://arxiv.org/abs/1803.11285 – Local, global, CNN/⾮CNNが網羅的に⽐較されている (が、著者らのチームにバイアスがかかっているかも) Local Global 非CNN CNN
  23. ベストプラクティス① 35 • Global → https://arxiv.org/abs/1711.02512 – 性能の良いベースネットワークを利⽤(ResNet以上) し、finetune(Siamere?)する –

    generalized mean-pooling (Lp, p=3) を利⽤ – 複数スケール (region) を利⽤ – RegionレベルでDiffusionベースのquery expansion https://arxiv.org/abs/1611.05113
  24. ベストプラクティス② 36 • Local → https://hal.inria.fr/hal-01131898/document – 特徴量としてはDELFを利⽤ – Indexing,

    matching, scoringがややこしい(ASMK – Geometric verificationは必須 – Query expansionもやる