Upgrade to Pro — share decks privately, control downloads, hide ads and more …

次元削減(主成分分析・線形判別分析・カーネル主成分分析)

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

 次元削減(主成分分析・線形判別分析・カーネル主成分分析)

研究室内発表用の資料です.

Avatar for Takahiro Kawashima

Takahiro Kawashima

June 03, 2020
Tweet

More Decks by Takahiro Kawashima

Other Decks in Research

Transcript

  1. ߴ࣍ݩσʔλͷ೰ΈͲ͜Ζ σʔλͷ࣍ݩ͕େ͖͘ͳΔͱʜʜ X ղऍੑ͕མͪΔ X σʔλಉ࢜ͷؔ܎ΛՄࢹԽͮ͠Β͍ X ʢ΄΅ʣઢܗैଐͳ͕࣠૿͑Δʢଟॏڞઢੑʣ X ML

    ΞϧΰϦζϜͷ݁Ռ͕ෆ҆ఆੑͷ΋ͱʢLasso ͳͲʣ X ྼܾఆͷΞϧΰϦζϜ͕ग़ͯ͘Δ X ࠷খೋ৐๏͸ σʔλ਺ ≥ ࣍ݩ਺ Ͱͳ͍ͱղ͚ͳ͍ X ͦͷଞ͍Ζ͍Ζ ˠ ಘΒΕͨߴ࣍ݩσʔλͷಛੑΛͳΔ΂͘อ࣋ͨ͠௿࣍ݩදݱΛٻ ΊΒΕΕ͹ղܾʂ 3
  2. ߴ࣍ݩσʔλͷ “ಛੑ”ʁ ߴ࣍ݩσʔλͷͲΜͳ “ಛੑ” Λอ͍͔࣋ͨ͠ʹԠͯ͡ɼ࣍ݩ࡟ ݮͷख๏͸༷ʑɽࠓճѻ͏ͷ͸ ڭࢣͳ͠ɿ X ओ੒෼෼ੳɿσʔλͷ࣠ํ޲ͷ͹Β͖ͭΛ৘ใͷ๛͔ͩ͞ͱ ղऍ͠ɼ࠷େԽ

    X Χʔωϧओ੒෼෼ੳɿओ੒෼෼ੳΛ͞ΒʹඇઢܗԽ ڭࢣ͋Γɿ X ઢܗ൑ผ෼ੳɿ࣍ݩ࡟ݮޙͷۭؒͰͷઢܗ൑ผثͷੑೳΛ࠷ େԽ ଞʹ΋ Isomap, LLE, t-SNE, KDR ͳͲॏཁͳख๏͸͍Ζ͍Ζ 4
  3. ओ੒෼෼ੳɿ෼ࢄ࠷େԽͷߟ͑ํ 軸方向の分散を最大化 σʔλͷ͹Β͖ͭ (⇔ ෼ࢄ) ͕େ ͖͘ͳΔ࣠͸৘ใΛଟؚ͘ΜͰ ͍Δ͸ͣ ˣ 1.

    ࠷΋σʔλ෼ࢄͷେ͖͘ͳ Δ࣠ u1 Λ୳͢ 2. u1 ʹ௚ަ͢Δ΋ͷͷ͏ͪ࣍ ʹσʔλ෼ࢄ͕େ͖͘ͳΔ ࣠ u2 Λ୳͢ 3. ಉ༷ʹଓ͚͍ͯ͘ ಘΒΕͨ {ui} ͷ͏ͪɼখ͍͞ i ʹରԠ͢Δ࣠͸σʔλͷ৘ใΛΑ ͘ଊ͍͑ͯΔ͸ͣ ͜ͷํ๏Λओ੒෼෼ੳ (PCA; Principal Component Analysis) ͱ ݺͿ 5
  4. ओ੒෼෼ੳɿ෼ࢄ࠷େԽʹΑΔఆࣜԽ (1) αϯϓϧू߹ɿ{xn}N n=1 , xn ∈ RD ໨తɿ{xn} ͷ෼ࢄΛ࠷΋Α͘ଊ͑Δ࣠

    u1 ∈ RD Λ୳͢ ࣠ํ޲͚ܾͩΊΕ͹Α͍ͷͰɼ∥u1∥ = (u⊺ 1 u1)1/2 = 1ͷ৚݅Λ௥Ճ {xn} ͷฏۉɾڞ෼ࢄߦྻΛఆٛɿ ¯ x ∶= 1 N N ∑ n=1 xn ∈ RD, S ∶= 1 N N ∑ n=1 (xn − ¯ x)(xn − ¯ x)⊺ ∈ RD×D. 6
  5. ओ੒෼෼ੳɿ෼ࢄ࠷େԽʹΑΔఆࣜԽ (2) ∥u1∥ = 1ͳͷͰɼ఺ xn ∈ RD ͷ࣠ u1

    ∈ RD ΁ͷࣹӨઌ͸ u⊺ 1 xn Αͬͯ u1 ΁ࣹӨޙͷσʔλͷ෼ࢄ͸ 1 N N ∑ i=1 (u⊺ 1 xn − u⊺ 1 ¯ x)2 = 1 N N ∑ i=1 (u⊺ 1 xnx⊺ n u1 − 2u⊺ 1 xn ¯ x⊺u1 + u⊺ 1 ¯ x¯ x⊺u1) = u⊺ 1 ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 N N ∑ i=1 (xnx⊺ n − 2xn ¯ x⊺ + ¯ x¯ x⊺) ⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭ u1 = u⊺ 1 ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 N N ∑ i=1 (xn − ¯ x)(xn − ¯ x)⊺ ⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭ u1 = u⊺ 1 Su1. ࠷ӈล u⊺ 1 Su1 Λ৚݅ u⊺ 1 u1 = 1 ͷ΋ͱ u1 ʹؔͯ͠࠷େԽʂ 8
  6. ओ੒෼෼ੳɿ෼ࢄ࠷େԽʹΑΔఆࣜԽ (3) ղ͖͍ͨ࠷େԽ໰୊͸ҎԼɿ Maximize u⊺ 1 Su1 subject to 1

    − u⊺ 1 u1 = 0 ⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭ ͜ΕΛ Lagrange ૒ର໰୊ʹམͱ͠ࠐΉͱ Maximize L(u1,λ1) = u⊺ 1 Su1 + λ1(1 − u⊺ 1 u1). ໨తؔ਺͸ u1 ʹؔͯ͠ 2 ࣍ܗࣜͰತͳͷͰɼͨΜʹۃ஋ΛٻΊ Ε͹࠷େԽͰ͖Δɿ ∂L(u1,λ) ∂u1 = 2Su1 − 2λ1u1 = 0 ⇔ Su1 = λ1u1. 9
  7. ओ੒෼෼ੳɿ෼ࢄ࠷େԽʹΑΔఆࣜԽ (4) Lagrange ૒ର໰୊Λղ͘ͱɼ෼ࢄ࠷େԽ໰୊͸ Su1 = λ1u1 ͷܗʹؼண͞Εͨɽ∥u1∥ = 1

    ͩͬͨ͜ͱΛࢥ͍ग़͢ͱɼ σʔλ {xn} ͷڞ෼ࢄߦྻ S ʹؔ͢Δݻ༗஋໰୊ʹͳͬͯΔʂ ͞Βʹ྆ลʹࠨ͔Β u⊺ 1 Λֻ͚Δͱɼu⊺ 1 u1 = 1 ΑΓ Su1 = λ1u1 ⇔ u⊺ 1 Su1 = λ1 ˠ ࠷େݻ༗஋ λ1 ͸σʔλͷୈҰओ੒෼࣠ u1 ํ޲ͷ෼ࢄʂ Ұൠʹୈ m ओ੒෼࣠ um ͸ S ͷୈ m ݻ༗஋ʹରԠ͢Δݻ༗ϕΫ τϧͱͯ͠ٻ·Δ 10
  8. ओ੒෼෼ੳɿޡࠩ࠷খԽͷߟ͑ํ PCA ͸ۙࣅޡࠩΛ࠷খԽ͢ΔΑ͏ʹ࣠ΛબͿํ਑Ͱ΋ಋग़͞ΕΔ 軸への射影による誤差を最小化 D ࣍ݩσʔλΛ M(< D) ݸͷओ ੒෼࣠ʹࣹӨ

    ˣ ࢒Γͷ D − M ࣍ݩ্ۭؒͷ࢒ࠩ Λ࠷খԽ͢ΔΑ͏ʹਖ਼ن௚ަج ఈ {um}M m=1 ΛͱΔ PCA ͷݟ௨͕͠Α͘ͳΔͷͰ͜Ε΋௥ͬͯΈΔ 11
  9. ओ੒෼෼ੳɿޡࠩ࠷খԽʹΑΔఆࣜԽ (1) D ݸͷਖ਼ن௚ަجఈͷ૊ {um}D m=1 ΛͱΔɿ uT l um

    = δlm = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 (l = m) 0 (l ≠ m). ೚ҙͷσʔλ xn ∈ RD ͸ {um}D m=1 ͷઢܕ݁߹Ͱදͤͯ xn = D ∑ m=1 αnmum, (1) ྆ลʹ u⊺ l (l ≠ m) Λࠨ͔Βॻ͚Δͱ u⊺ l xn = D ∑ m=1 αnmu⊺ l um = D ∑ m=1 αnmδlm = αnl. ͜ΕΛ (1) ʹ୅ೖ͢Δͱ xn = D ∑ m=1 (u⊺ m xn)um. 12
  10. ओ੒෼෼ੳɿޡࠩ࠷খԽʹΑΔఆࣜԽ (2) ͱ͜ΖͰɼσʔλ఺ xn ∈ RD ͷ M ݸͷ࣠Ͱͷۙࣅͨ͠఺ ˜

    xn ͸ ˜ xn = M ∑ m=1 αnmum + D ∑ m=M+1 βmum ͱදͤΔ͸ͣ 2→1次元の 変換 ࣗ༝ύϥϝʔλ͸ {αnm},{βm},{um}. ʢฏۉʣೋ৐ޡࠩ J = 1 N N ∑ n=1 ∥xn − ˜ xn∥2 Λ࠷খԽ͢ΔΑ͏ʹύϥϝʔλ Λܾఆ 13
  11. ओ੒෼෼ੳɿޡࠩ࠷খԽʹΑΔఆࣜԽ (3) J ಺ͷ ˜ xn Λల։ J = 1

    N N ∑ n=1 ∥xn − ˜ xn ∥2 = 1 N N ∑ n=1 [x⊺ n xn − 2x⊺ n ( M ∑ m=1 αnm um + D ∑ m=M+1 βm um ) + M ∑ m=1 α2 nm ] + M ∑ m=M+1 β2 m ͜ͷ J ʹରͯ͠ αnm,βm ʹؔ͢ΔҰ࣍ͷ࠷దੑ৚݅ΛͱΔɿ ∂J ∂αnm = 2 N (−x⊺ n um + αnm) = 0 ⇒ αnm = x⊺ n um ∂J ∂βm = − 2 N N ∑ n=1 x⊺ n um + 2βm = 0 ⇒ βm = ¯ x⊺um ͢ͳΘͪ ˜ xn = ∑M m=1 αnmum + ∑D m=M+1 βmum ͸ ˜ xn = M ∑ m=1 (x⊺ n um)um + D ∑ m=M+1 (¯ x⊺um)um 14
  12. ओ੒෼෼ੳɿޡࠩ࠷খԽʹΑΔఆࣜԽ (4) ֤σʔλ఺Ͱxn = ∑D m=1 (u⊺ m xn)um ͕੒ཱͨ͜͠ͱΛࢥ͍ग़͢

    ઌఔͷ˜ xn = ∑M m=1 (x⊺ n um)um + ∑D m=M+1 (¯ x⊺um)um ͱ͋Θͤͯ xn − ˜ xn = D ∑ m=1 (u⊺ m xn)um − M ∑ m=1 (x⊺ n um)um − D ∑ m=M+1 (¯ x⊺um)um = D ∑ m=M+1 [(xn − ¯ x)⊺um]um ͳͷͰ J = 1 N ∑N n=1 ∥xn − ˜ xn∥2 ͸ J = 1 N N ∑ n=1 ( D ∑ m=M+1 [(xn − ¯ x)⊺um ]um ) ⊺ ( D ∑ m=M+1 [(xn − ¯ x)⊺um ]um ) = 1 N N ∑ n=1 D ∑ m=M+1 u⊺ m (xn − ¯ x)(xn − ¯ x)⊺um = D ∑ m=M+1 u⊺ m 1 N N ∑ n=1 (xn − ¯ x)(xn − ¯ x)⊺um = D ∑ m=M+1 u⊺ m Sum . 15
  13. ओ੒෼෼ੳɿޡࠩ࠷খԽʹΑΔఆࣜԽ (5) D = 2 ͔Β M = 1 ΁ͷ࣍ݩ࡟ݮΛߟ͑Δɽղ͖͍ͨ࠷খԽ໰୊͸

    Minimize J = u⊺ 2 Su2 subject to 1 − u⊺ 2 u2 = 0 ⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭ Ͱ͋ΓɼLagrange ૒ର໰୊͸ Minimize L(u2,λ2) = u⊺ 2 Su2 + λ2(1 − u⊺ 2 u2). ෼ࢄ࠷େԽͷࡍͱಉ༷ʹҰ࣍ͷ࠷దੑ৚݅Λղ͘ͱ Su2 = λ2u2 ݻ༗஋໰୊ʂ ⇒ J = u⊺ 2 Su2 = λ2u⊺ 2 u2 = λ2. ΍͸Γ PCA ͸ S ͷݻ༗஋໰୊ʹؼண͞Εɼ੾Γࣺͯͨ࣠ͷݻ༗ ஋͸௿࣍ݩۙࣅͷೋ৐ޡࠩͰ͋Δ͜ͱ΋Θ͔ͬͨʢ΍ͬͨʔʂʣ 16
  14. ಛҟ஋෼ղ ಛҟ஋෼ղ (SVD; Singular Value Decomposition) ೚ҙͷ X ≠ O

    ∈ RD×D ͸௚ަߦྻ U ∈ RD×D,V ∈ DN×N ͱର֯ ੒෼ͷΈʹ஋Λ΋ͭ Σ ∈ RD×N ʹΑͬͯ X = UΣV ⊺ ͱදͤΔ 1 SVD ͸ݻ༗஋෼ղͷҰൠԽͱଊ͑ΒΕΔ 1rank(X) = K ≤ min{D, N} ͱͯ͠ U ∈ RD×K , V ∈ DN×K , Σ ∈ RK×K ͱͱΔ ྲّྀ΋͋Γɼͪ͜Β΋Α͘࢖͏ 17
  15. ओ੒෼෼ੳɿಛҟ஋෼ղͱͷؔ࿈ (1) ฏۉ 0 ͷσʔλ {xn} ͔Β X = (x1,...,xN

    ) ∈ RD×N Λఆٛ 2 ڞ෼ࢄߦྻ͸ S = N−1 ∑N n=1 xnx⊺ n = N−1XX⊺ʢฏۉ 0 ʹ஫ҙʣ S ͷࣜ಺ͷ X ʹ SVD Λద༻͢ΔͱʢX = UΣV ⊺ʣ S = 1 N UΣV ⊺(UΣV ⊺)⊺ = 1 N UΣ2U⊺ PCA ʹΑΔ S ͷݻ༗஋෼ղΛ S = W ΛW ⊺ ͱ͢Δͱ 3 W ΛW ⊺ = U ⎛ ⎝ 1 N Σ2 ⎞ ⎠ U⊺. ௚ަߦྻ W ↔ U ͱର֯ߦྻ Λ ↔ 1 N Σ2 ͕ରԠʂ 2؆୯ͷͨΊ rank(X) = min{D, N} ͱ͢Δ 3Σ2 = ΣΣ⊺ ∈ RD×D ͱఆٛɽ 19
  16. ओ੒෼෼ੳɿಛҟ஋෼ղͱͷؔ࿈ (2) ࣍ʹɼSVD ͷࣜ X = UΣV ⊺ Λมܗ͢Δͱ X⊺U

    = V Σ⊺ɽ ֤੒෼͝ͱͷදݱʹ͢Δͱ x⊺ n um = σmvmn. PCA Ͱ u⊺ m xn = x⊺ n um ͸σʔλ xn ͷ࣠ um ΁ͷࣹӨͩͬͨ 4 ·ͨಛҟ஋ σm ͕࣠ m ͷεέʔϧΛఆΊ͍ͯΔͷͰɼ V = (vmn) ͸εέʔϧਖ਼نԽ͞ΕࣹͨӨઌͷσʔλදݱ 4p.8 ࢀর 20
  17. ओ੒෼෼ੳɿಛҟ஋෼ղͱͷؔ࿈ (3) ·ͱΊΔͱɼσʔλߦྻ X ∈ RD×N ͷಛҟ஋෼ղ X = UΣV

    ⊺ Λ ߟ͑ͨͱ͖ɼ X ࠨಛҟϕΫτϧ um ͸ PCA ͷओ੒෼࣠ͱରԠͱ X ಛҟ஋ͱ PCA ͷݻ༗஋͸ λm = σ2 m /N ͱରԠ X ӈಛҟϕΫτϧ vn ͸ओ੒෼ۭؒ΁ࣹӨͨ͠σʔλʹରԠ PCA ΑΓ΋ SVD ͷܭࢉΞϧΰϦζϜͷํ͕҆ఆͱݴΘΕ͓ͯΓɼ scikit-learn ͷ PCA ͸಺෦తʹ͸ SVD 21
  18. ओ੒෼෼ੳɿ࣮ફ (1) Wine σʔληοτʹ PCA Λ͔͚ͯΈΔ X Ξϧίʔϧ౓਺ɼϦϯΰࢎͷؚ༗ྔɼ৭૬ͳͲ D =

    13 ࣍ݩ ͷಛ௃ྔ X αϯϓϧ਺͸ N = 178 X ຊདྷ͸ 3 ඼छͷ൑ผ໰୊༻ͷσʔληοτ X ՄࢹԽͷͨΊ 2 ࣍ݩʹ࣍ݩ࡟ݮͯ͠ΈΔ 22
  19. ओ੒෼෼ੳɿ࣮ફ (2) -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 -0.15

    -0.10 -0.05 0.00 0.05 0.10 0.15 Linear PCA (wine) ओ੒෼࣠ 2 ͭͰ΋ͦΕͳΓʹࣝผͰ͖ͦ͏ͳ৘ใྔΛอ͍ͬͯΔ 23
  20. ओ੒෼෼ੳɿ࣮ફ (4) cM = ∑M m=1 λm ∑D m=1 λm

    Ͱఆٛ͞ΕΔྦྷੵد༩཰΋࣍ݩ਺ΛܾΊΔॏཁͳࢦඪ 1 2 3 4 5 6 7 8 9 10 11 12 13 0.4 0.5 0.6 0.7 0.8 0.9 1.0 PCA cumulative contribution rates PC number c u10 Ͱ 95%௒ 25
  21. ओ੒෼෼ੳɿ·ͱΊͱ஫ҙ ·ͱΊ X ෼ࢄ࠷େԽͱޡࠩ࠷খԽͷ؍఺͔ΒͦΕͧΕ PCA Λಋग़ͨ͠ X PCA ͷݻ༗஋͸࣠ํ޲ͷ෼ࢄ΍੾Γࣺͯͨ࣠ํ޲ͷೋ৐ޡࠩ Λҙຯ

    X ࣮͸ SVD ͱ౳Ձͳखଓ͖ X ݻ༗஋ΛݟΕ͹Կ൪໨ͷओ੒෼࣠·Ͱ৘ใΛ࣋ͬͯͦ͏͔େ ମΘ͔Δ X ఆྔతʹ࣠਺ΛܾΊΔํ๏࿦΋͋Δʢ৘ใྔج४ͳͲʣ ஫ҙ X લॲཧͱͯ͠σʔλͷਖ਼نԽ͸΄΅ඞਢ X ࣠ͷεέʔϧ͕େ͖͚Ε͹෼ࢄ΍ޡࠩ΋େ͖͘ͳͬͯ͠·͏ 26
  22. ઢܗ൑ผ෼ੳɿ͍ΜͱΖ 2 ࣍ݩ্ۭؒͰ͸ઢ ܗ൑ผͰ͖ͦ͏ ˣ 1 ࣍ݩͰ΋͍͚Δʁ ઢܗ൑ผ෼ੳ (LDA; Linear

    Discriminant Analysis) ͸ઢܗ൑ผੑΛ ͳΔ΂͘อͭ௿࣍ݩͷ࣠Λݟ͚ͭΔڭࢣ͋Γ࣍ݩ࡟ݮ๏ FLDA (Fisher’s LDA) ΍ FDA ͳͲݺͼํ͸৭ʑ͋Δ 5 5ͨΜʹ LDA ͱݺͿͱ Latent Dirichlet Allocation ͱ΍΍͍͜͠ 27
  23. ઢܗ൑ผ෼ੳɿ ʮฏۉͷࠩʯ࠷େԽʁ(1) σʔλɿ{(yn,xn)}N n=1 , yn ∈ {0,1}, xn ∈

    RD ൑ผؔ਺ɿf(x) = w⊺x f(x) ≥ w0 ͳΒΫϥε 0, f(x) < w0 ͳΒΫϥε 1 ͱ༧ଌ͢Δ ֤Ϋϥεͷఴࣈू߹ Ck = {n ∈ {1,...,N}∣yn = k} ΋ఆٛ (k = 0,1) ·ࣹͣӨޙͷ 2 Ϋϥεؒͷฏۉͷࠩͷ࠷େԽΛߟ͑ͯΈΔ ∥w∥2 = 1,mk = ∑n∈Ck xk ʢΫϥε಺ฏۉʣͱͯ͠ɼ࣠ w ΁ͷࣹӨ ޙͷ֤Ϋϥε಺ฏۉ m0,m1 ͷࠩ͸ m1 − m0 = w⊺(m1 − m0). 28
  24. ઢܗ൑ผ෼ੳɿ ʮฏۉͷࠩʯ࠷େԽʁ(2) લϖʔδͷ m1 − m0 = w⊺(m1 − m0)

    ͷ࠷େԽ͸ Lagrange ͷະఆ৐਺๏ΑΓҎԼʹؼணɿ Maximize L(w,λ) = w⊺(m1 − m0) + λ(1 − w⊺w) w ʹ͍ͭͯҰ࣍ͷ࠷దੑ৚݅ΛٻΊΔͱ ∂L ∂w = (m1 − m0) − 2λw = 0 ⇔ w = 1 2λ (m1 − m0). w ͸࣠ͷํ޲͚ͩఆΊΕ͹͍͍ͷͰɼw ∝ m1 − m0 Ͱ OK 29
  25. ઢܗ൑ผ෼ੳɿ෼ࢄ΋ߟྀͯ͠ΈΑ͏ (1) ࣹӨޙͷΫϥε಺෼ࢄΛ s2 k = ∑ n∈Ck (f(xn) −

    mk)2 = ∑ n∈Ck (w⊺xn − mk)2 ͱͯ͠ɼҎԼͷ໨తؔ਺ 6 Λ࠷େԽ͢Δ J(w) = (m1 − m0)2 s2 0 + s2 1 . ͜ΕͰʮฏۉͷࠩʯͷ࠷େԽͱࣹӨޙͷ֤Ϋϥε಺ͷ෼ࢄΛ࠷খ ԽΛಉ࣌ʹߟ͑ΒΕͦ͏ 6PRML Ͱ͸ Fisher criterion ͱݺ͹Ε͍ͯΔ͕ɼίϯηϯαεͷͱΕͨݺশ͸ ͳͦ͞͏ 31
  26. ઢܗ൑ผ෼ੳɿ෼ࢄ΋ߟྀͯ͠ΈΑ͏ (2) ͜͜ͰҎԼͷΫϥεؒ෼ࢄͱΫϥε಺෼ࢄΛఆٛ 7ɿ SB = (m1 − m0)(m1 −

    m0)⊺ ɿΫϥεؒ෼ࢄ, SW = ∑ k∈{0,1} ∑ n∈Ck (xn − mk)(xn − mk)⊺ ɿΫϥε಺෼ࢄ. ͢Δͱ J(w) = (m1 − m0)2/(s2 0 + s2 1 ) ͷ෼ࢠ͸ (m1 − m0)2 = w⊺(m1 − m0)(m1 − m0)⊺w = w⊺SBw ͱͳΓɼ෼฼͸ s2 0 + s2 1 = ∑ k∈{0,1} ∑ n∈Ck (w⊺xn − mk)2 = w⊺ ∑ k∈{0,1} ∑ n∈Ck (xn − mk)(xn − mk)⊺w = w⊺SW w. 7΋͸΍ڞ෼ࢄߦྻͦͷ΋ͷͰ͸ͳ͍ͷͰɼscatter matrix ͳͲͱ΋ݺ͹ΕΔ 32
  27. ઢܗ൑ผ෼ੳɿ෼ࢄ΋ߟྀͯ͠ΈΑ͏ (3) J(w) = (m1 − m0)2 s2 0 +

    s2 1 = w⊺SBw w⊺SW w ͕Θ͔ͬͨ 8 ͷͰɼ͜ͷ࠷େԽ໰୊Λߟ͑ͯΈΔɽ ద౰ʹ w ΛεέʔϦϯά͢Δ͜ͱʹΑΓɼ෼฼Λ w⊺SW w = 1 ͱ Ͱ͖Δ͸ͣɽΑͬͯҎԼͷ໰୊Λߟ͑ͯΈΔ Maximize w⊺SBw subject to 1 − w⊺SW w = 0 ⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭ Lagrange ؔ਺͸ L(w,λ) = w⊺SBw + λ(1 − w⊺SW w) ͱͳΔɽ 8͜Ε͸ Rayleigh ঎ͱݺ͹ΕΔܗࣜɽ 33
  28. ઢܗ൑ผ෼ੳɿ෼ࢄ΋ߟྀͯ͠ΈΑ͏ (4) L(w,λ) = w⊺SBw + λ(1 − w⊺SW w)

    ʹؔͯ͠ɼྫʹΑͬͯ w ʹ͍ͭͯͷҰ࣍ͷ࠷దੑ৚݅Λղ͘ͱɼ ∂L ∂w = 2SBw − 2λSW w = 0 ⇔ SBw = λSW w ͜Ε͸ҰൠԽݻ༗஋໰୊ͱݺ͹ΕΔ໰୊Ϋϥε ࠓճ͸σʔλͷϥϯΫམ͕ͪͳ͚Ε͹ SW ͸ਖ਼ଇͳͷͰɼ S−1 W SBw = λw ͱ͍͏ී௨ͷݻ༗஋໰୊ʹམͱ͠ࠐΊΔ 34
  29. ઢܗ൑ผ෼ੳɿ෼ࢄ΋ߟྀͯ͠ΈΑ͏ (5) S−1 W SBw = λw ࣮͸ࠓճ͸ݻ༗஋໰୊Λղ͘͜ͱ͢Βෆཁɽ SBw =

    (m1 − m0)(m1 − m0)⊺w = (m1 − m0)(m1 − m0) ΑΓ w = λ−1S−1 W SBw = λ−1(m1 − m0)S−1 W (m1 − m0). w ͷํ޲͚͕ͩඞཁͰεΧϥʔ஋܎਺͸ແࢹͯ͠Α͍ͷͰɼ w ∝ S−1 W (m1 − m0) ͕ٻΊΔ΂͖ࣹӨઌͷ࣠Λද͢ SW ͕౳ํతͳΒʮฏۉͷࠩʯ࠷େԽͱ౳Ձʂ 35
  30. ઢܗ൑ผ෼ੳɿଟ࣍ݩଟΫϥε΁ͷҰൠԽ (1) ͜Ε·Ͱ͸ 2 ஋෼ྨ໰୊ɼRD → R ʹࣹӨ͢Δ໰୊ͷΈߟ͑ͨ ౰વͷཉٻͱͯ͠ɼK Ϋϥε෼ྨ໰୊Ͱ

    RD → RM (M < D) ΁ͷ ࣍ݩ࡟ݮΛߟ͑ͨ͘ͳΔ ͭ·Γ໰୊ઃఆ͸ σʔλɿ{(yn,xn)}N n=1 ,yn ∈ {0,...,K − 1},xn ∈ RD ൑ผؔ਺ɿf(x) = W ⊺x ∈ RM ,W ∈ RD×M . 37
  31. ઢܗ൑ผ෼ੳɿଟ࣍ݩଟΫϥε΁ͷҰൠԽ (2) Ϋϥε k ʹଐ͢Δσʔλ਺Λ Nk ɼΫϥε಺ɾΫϥεؒ෼ࢄΛ SB = K−1

    ∑ k=0 Nk(mk − m)(mk − m)⊺, SW = K−1 ∑ k=0 ∑ n∈Ck (xn − mk)(xn − mk)⊺ ͱఆٛ͢Δɽ ͜͜Ͱ mk,m ͸ͦΕͧΕΫϥε಺͓Αͼશαϯϓϧͷฏۉɿ mk = 1 Nk ∑ n∈Ck xk, m = 1 N N ∑ n=1 xn = 1 N K−1 ∑ k=0 Nkmk. 38
  32. ઢܗ൑ผ෼ੳɿଟ࣍ݩଟΫϥε΁ͷҰൠԽ (3) 1 ࣍ݩ΁ͷࣹӨͷ৔߹͸ J(w) = w⊺SBw w⊺SW w ͷ࠷େԽΛߟ͑ͨɽ

    ૉ௚ʹ (W ⊺SBW )(W ⊺SW W )−1 ͱ֦ு͢ΔͱɼεΧϥʔ஋Ͱ ͳ͍ͷͰ௨ৗͷ࠷దԽ໰୊ͷൣᙝΛ௒͑ͯ͠·͏ɽ ͦ͜ͰʮࣹӨޙͷ֤࣠ํ޲ͷΫϥεؒ෼ࢄΛ࠷େԽ͠ɼ Ϋϥε಺෼ࢄΛ࠷খԽ͢Δʯͱ͍͏؍఺͔Β J(W ) = tr(W ⊺SBW ) tr(W ⊺SW W ) Λ໨తؔ਺ͱ͢Ε͹͍͍͔΋ʂ 39
  33. ઢܗ൑ผ෼ੳɿଟ࣍ݩଟΫϥε΁ͷҰൠԽ (3) J(W ) = tr(W ⊺SBW ) tr(W ⊺SW

    W ) ͱ͍͏ LDA ͷ໨తؔ਺͸ trace ratio ܗࣜͱݺ͹Ε͍ͯΔ [1]ɽ ͔͠͠ trace ratio ܗࣜͷ࠷దԽ໰୊͸ඇತͰ͋ΔͨΊɼҎԼͷ ratio trace ܗࣜͱݺ͹ΕΔತ؇࿨໰୊͕ϝδϟʔͰ͋Δɿ J(W ) = tr{(W ⊺SBW )(W ⊺SW W )−1}. ratio trace ܗࣜ͸ҎԼͷ໨తؔ਺ͱ౳ՁʹͳΔɿ J(W ) = det(W ⊺SBW ) det(W ⊺SW W ) . 40
  34. ઢܗ൑ผ෼ੳɿଟ࣍ݩଟΫϥε΁ͷҰൠԽ (4) ࠓճ͸ ratio trace ܗࣜ J(W ) = tr{(W

    ⊺SBW )(W ⊺SW W )−1}. ͷ࠷େԽΛߟ͑Δɽͨͩࣗ͠༝౓Λམͱͨ͢Ίʹ W ⊺SW W = I ͷ੍໿Λ՝͢ɽ ͢Δͱ tr(⋅) ಺ͷ W ⊺SW W ΋୯ҐߦྻʹͳΔͷͰɼओ໰୊͸ Maximize tr(W ⊺SBW ) subject to I − W ⊺SW W = O ⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭ ͱͳΔɽLagrange ؔ਺͸ ⟨⋅,⋅⟩ Λߦྻͷ಺ੵͱͯ͠ɼ L(W ,Λ) = tr(W ⊺SBW ) + ⟨Λ,I − W ⊺SW W ⟩. 41
  35. ઢܗ൑ผ෼ੳɿଟ࣍ݩଟΫϥε΁ͷҰൠԽ (5) Lagrange ؔ਺Λղ͘ͷʹඞཁͳߦྻܭࢉͷ४උΛ͓ͯ͘͠ɽ ·ͣߦྻͷ಺ੵ͸୯ʹཁૉ͝ͱͷੵͷ࿨Ͱɼ ⟨A,B⟩ = ∑ i ∑

    j AijBij ͱఆٛ͞Εɼ ⟨A,B⟩ = tr(A⊺B) ͕੒Γཱͭɽ ࣍ʹೋ࣍ܗࣜͷτϨʔεͷඍ෼ެࣜͱͯ͠ ∂ ∂X tr(CX⊺BX) = BXC + B⊺XC⊺ ͕੒Γཱͭ 9ɽ 9Matrix Cookbook [2] ࣜ (117) 42
  36. ઢܗ൑ผ෼ੳɿଟ࣍ݩଟΫϥε΁ͷҰൠԽ (6) ಺ੵͱτϨʔεͷؔ܎ΑΓϥάϥϯδϡؔ਺͸ L(W ,Λ) = tr(W ⊺SBW ) +

    ⟨Λ,I − W ⊺SW W ⟩ = tr(W ⊺SBW ) + tr(Λ) − tr(ΛW ⊺SW W ) ͱͳΔͷͰɼW ʹؔͯ͠ภඍ෼͢Δͱɼ ∂L ∂W = SBW + S⊺ B W − SW W Λ − S⊺ W W Λ⊺ = 2SBW − 2SW W Λ = O ⇔ SBW = SW W Λ ͱͳΔɽΛ = diag(λm),W = (wm) ͱ͢Δͱɼ SBwm = λmSW wm ͱ΍͸ΓҰൠԽݻ༗஋໰୊ʹؼண͞Εͨʂ 43
  37. ઢܗ൑ผ෼ੳɿ·ͱΊͱ஫ҙ ·ͱΊ X LDA ͱͦͷଟ࣍ݩଟΫϥε΁ͷҰൠԽΛಋग़ͨ͠ X LDA ͕ҰൠԽݻ༗஋໰୊ʹམͱ͠ࠐΊΔ͜ͱΛࣔͨ͠ ஫ҙ X

    LDA ͸֤܈ͷਖ਼نੑͱ౳෼ࢄੑΛ҉໧తʹԾఆ͍ͯ͠Δ X ౳෼ࢄੑΛԾఆ͠ͳ͍ҰൠԽͱͯ͠ QDA (Quadratic Discriminant Analysis) ͕͋Δ 44
  38. Χʔωϧओ੒෼෼ੳɿඇઢܗ࣍ݩ࡟ݮ΁ͷల๬ -4 -2 0 2 4 -4 -2 0 2

    4 -0.10 -0.05 0.00 0.05 0.10 -0.10 -0.05 0.00 0.05 0.10 PCA 3次元へ 非線形変換 0.04 0.06 0.08 0.10 -0.10 -0.05 0.00 0.05 0.10 PCA ద੾ͳߴ࣍ݩۭؒ΁ͷࣹӨΛఆٛͰ͖Ε͹ʜʜʁ 46
  39. Χʔωϧओ੒෼෼ੳɿΧʔωϧ๏ͷಋೖ (1) ͱ͸ݴͬͯ΋ʮద੾ͳߴ࣍ݩۭؒʯͳΜͯී௨͸Θ͔Βͳ͍ͷͰɼ Ұ୴ߟ͑ͳ͍͜ͱʹ͢Δ ͱΓ͋͑ͣ PCAʢ෼ࢄ࠷େԽʣͷ໨తؔ਺Λݟ௚ͯ͠ΈΔɿ u⊺Su = u⊺ ⎧

    ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 N N ∑ i=1 (xn − ¯ x)(xn − ¯ x)⊺ ⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭ u = 1 N N ∑ i=1 {u⊺(xn − ¯ x)}2 . ۩ମతͳؔ਺ܗ͸ؾʹͤͣɼσʔλ఺Λ x ↦ Φ(x) ∈ H ͱඇઢܗ ม׵͢Δࣸ૾ Φ Λಋೖͯ͠ΈΔɽ Φ ͕ඈ͹͢ઌͷۭؒ H Ͱͷ “࣠” Λ f ∈ H ͱ͓͘ͱɼ໨తؔ਺͸ 1 N N ∑ i=1 ⟨f, ˜ Φ(xn)⟩2, ˜ Φ(x) = Φ(x) − 1 N N ∑ n=1 Φ(xn). ͱͳΔɽ 47
  40. Χʔωϧओ੒෼෼ੳɿΧʔωϧ๏ͷಋೖ (2) ໨తؔ਺͕ 1 N N ∑ i=1 ⟨f, ˜

    Φ(xn)⟩2 ͱͳͬͨ ˠ ͱʹ͔͘಺ੵ͕ॏཁͬΆ͍ ͦ͜ͰɼఱԼΓత͕ͩH ͸಺ੵ͕ఆٛ͞Εͨઢܗۭؒͱ͢Δ ಺ੵ͕ఆٛ͞ΕΔͱϊϧϜ΋ ∥f∥ = ⟨f,f⟩1/2 ͷܗͰఆΊΒΕΔͨ Ίɼ∥f∥ = 1 ͷ৚݅΋༩͑ΒΕΔ ͕ͨͬͯ͠ Maximize 1 N N ∑ i=1 ⟨f, ˜ Φ(xn)⟩2 subject to 1 − ∥f∥ = 0 ⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭ Λߟ͑Ε͹ OK 48
  41. Χʔωϧओ੒෼෼ੳɿΧʔωϧ๏ͷಋೖ (3) Maximize 1 N N ∑ i=1 ⟨f, ˜

    Φ(xn)⟩2 subject to 1 − ∥f∥ = 0 ⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭ ͱ͍͏໰୊ΛͲ͏ʹ͔ղ͚͹ɼඇઢܗ࣍ݩ࡟ݮΛ࣮ݱͰ͖Δʢ͔ ΋͠Εͳ͍ʣ͜ͱ͕Θ͔ͬͨ ͔͠͠ҰମͲ͏΍ͬͯʜʜʁ ˠ ࣮͸࣠ f ͸ f = N ∑ n=1 an ˜ Φ(xn) ͷܗͰ୳ͤ͹े෼ʂ ͜ͷࣄ࣮Λ͍ࣔͯ͘͠ 49
  42. Χʔωϧओ੒෼෼ੳɿΧʔωϧ๏ͷಋೖ (3) f Λ {˜ Φ(xn)} ͷுΔ্ۭؒͷ੒෼ f0 ͱͦͷ௚ަ੒෼ f

    ʹ෼ղɿ f = f0 + f . ಺ੵԋࢉͷੑ࣭ΑΓ ⟨f, ˜ Φ(xn)⟩ = ⟨f0, ˜ Φ(xn)⟩ + ⟨f , ˜ Φ(xn)⟩ = ⟨f0, ˜ Φ(xn)⟩. f ͸د༩͠ͳ͍ ·ͨ ∥f∥ = 1 ͷ੍໿ʹؔͯ͠͸ɼϐλΰϥεͷఆཧΑΓ ∥f∥2 = ∥f0∥2 + ∥f ∥2 = 1 ͳͷͰɼ∥f ∥ = 0 ͱͯ͠ f0 ੒෼ͷΈΛߟ͑Ε͹ OKʂ10 10͜ͷࣄ࣮ΛҰൠతʹड़΂ͨఆཧΛ Representer ఆཧͱΑͿɽ 50
  43. Χʔωϧओ੒෼෼ੳɿΧʔωϧ๏ͷಋೖ (4) Ҏ্ͷٞ࿦ΑΓɼ໰୊ max∥f∥=1 1 N ∑N n=1 ⟨f, ˜

    Φ(xn)⟩2 ͕ max a1,...,aN 1 N N ∑ n=1 ⟨ N ∑ m=1 am ˜ Φ(xm), ˜ Φ(xn)⟩ 2 = 1 N N ∑ n=1 N ∑ m=1 (am⟨˜ Φ(xm), ˜ Φ(xn)⟩)2 ͱมܗͰ͖Δ͜ͱ͕Θ͔ͬͨɽ த৺ԽάϥϜߦྻ ˜ K ∈ RN×N Λ ˜ Knm = ⟨˜ Φ(xn), ˜ Φ(xm)⟩ ʹΑΓఆٛ͠ɼa = (an)N n=1 ͱ͢Δͱɼ໨తؔ਺͸ max a 1 N a⊺ ˜ K2a. 51
  44. Χʔωϧओ੒෼෼ੳɿΧʔωϧ๏ͷಋೖ (5) ࣍ʹ ∥f∥ = 1 ͷ੍໿ࣜΛվΊͯோΊΔͱɼ ∥f∥2 = ⟨

    N ∑ n=1 an ˜ Φ(xn), N ∑ m=1 am ˜ Φ(xm)⟩ = a⊺ ˜ Ka = 1. ͕ͨͬͯ͠ Maximize a⊺ ˜ K2a subject to 1 − a⊺ ˜ Ka = 0 ⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭ ͱ͍͏࠷େԽ໰୊Λղ͚͹ྑ͍͜ͱʹͳͬͨ 11 ͔͠͠·ͩࣸ૾ Φ ΛͲ͏ѻ͏͔ɼͱ͍͏໰୊͕࢒͍ͬͯΔʜʜ 11໨తؔ਺ͷఆ਺܎਺͸লུɽ 52
  45. Χʔωϧओ੒෼෼ੳɿΧʔωϧ๏ͷಋೖ (6) த৺ԽάϥϜߦྻͷఆٛࣜΛల։ͯ͠ΈΔͱ ˜ Knm = ⟨˜ Φ(xn), ˜ Φ(xm)⟩

    = ⟨Φ(xn) − 1 N N ∑ i=1 Φ(xi),Φ(xm) − 1 N N ∑ j=1 Φ(xj)⟩ = ⟨Φ(xn),Φ(xm)⟩ − 1 N N ∑ i=1 ⟨Φ(xi),Φ(xm)⟩ − 1 N N ∑ j=1 ⟨Φ(xn),Φ(xj)⟩ + 1 N2 N ∑ i,j ⟨Φ(xi),Φ(xj)⟩ ͱɼ⟨Φ(xi),Φ(xj)⟩ ͰఆΊΒΕ͍ͯΔ͜ͱ͕Θ͔Δ ͦ͜Ͱࢥ͍੾ͬͯࣸ૾ Φ ͸ཅʹߟ͑ͣ಺ੵ ⟨Φ(xi),Φ(xj)⟩ ͷܭ ࢉํ๏͚ܾͩΊΔ͜ͱʹ͢Δ ͢ͳΘͪΧʔωϧؔ਺ k(xi,xj) = ⟨Φ(xi),Φ(xj)⟩ ͷܗͷΈΛఆٛ 53
  46. Χʔωϧओ੒෼෼ੳɿΧʔωϧ๏ͷಋೖ (7) Χʔωϧ๏ ಺ੵܭࢉ͕ࠜװΛͳ͢ઢܗͳ౷ܭղੳ๏Λɼσʔλ఺ͷඇઢܗ ม׵ޙͷ಺ੵΛΧʔωϧؔ਺ k(⋅,⋅) ʹΑΓఆΊΔ͜ͱͰɼඇઢ ܗʹ֦ு͢Δํࡦ ओ੒෼෼ੳͷΧʔωϧԽɿΧʔωϧओ੒෼෼ੳ (Χʔωϧ

    PCA) Χʔωϧؔ਺ k ͸ͳΜͰ΋ OK ͱ͍͏Θ͚͸ͳ͘ɼάϥϜߦྻ K = ⎛ ⎜ ⎜ ⎝ k(x1,x1) ⋯ k(x1,xN ) ⋮ ⋱ ⋮ k(xN ,x1) ⋯ k(xN ,xN ) ⎞ ⎟ ⎟ ⎠ ͕೚ҙͷ {xn} ʹ͍ͭͯ൒ਖ਼ఆ஋Ͱͳ͚Ε͹ͳΒͳ͍ 12 12͜ΕΛຬͨ͢ k(⋅, ⋅) Λͱ͘ʹਖ਼ఆ஋ΧʔωϧͱΑͿɽ 54
  47. Χʔωϧओ੒෼෼ੳɿΧʔωϧ PCA Λղ͘ (1) Χʔωϧ PCA ͷओ໰୊͸ҎԼͰ༩͑ΒΕͨɿ Maximize a⊺ ˜

    K2a subject to 1 − a⊺ ˜ Ka = 0 ⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭ Lagrange ؔ਺͸ L(a,λ) = a⊺ ˜ K2a + λ(1 − a⊺ ˜ Ka) ͱͳΓɼa ʹ͍ͭͯҰ࣍ͷ࠷దੑ৚݅͸ ∂L(a,λ) ∂a = 2 ˜ K2a − 2λ ˜ Ka = 0 ͳͷͰ ˜ K2a = λ ˜ Ka ͳΔҰൠԽݻ༗஋໰୊ʹؼணͨ͠ɽ 55
  48. Χʔωϧओ੒෼෼ੳɿΧʔωϧ PCA Λղ͘ (2) Χʔωϧ PCA ͷղ ˜ K2a =

    λ ˜ Ka ͸ K ͕ਖ਼ଇͳΒ ˜ Ka = λa ͱ͍͏ݻ༗஋໰୊ʹͰ͖Δ 13 13࣮͸ K ͕൒ਖ਼ఆ஋Ͱ΋ݻ༗஋໰୊ʹམͱ͠ࠐΊΔɽจݙ [3](p.7) ͳͲࢀরɽ 56
  49. Χʔωϧओ੒෼෼ੳɿਖ਼ఆ஋Χʔωϧͷܾఆ (1) ݁ہΧʔωϧ PCA ͷखॱ͸ 1. Χʔωϧؔ਺ k(⋅,⋅) Λద੾ʹఆΊΔ 2.

    ಘΒΕͨσʔλ఺ͱ k(⋅,⋅) ͔Βத৺ԽάϥϜߦྻΛܭࢉ 3. ʢҰൠԽʣݻ༗஋໰୊Λղ͍ͯ࣠Λܾఆ ͱͳΔ ͱ͜ΖͰ݁ہΧʔωϧؔ਺ k(⋅,⋅) ͸Ͳ͏ܾΊΔʁ ˠ େ఍͸طʹੑ࣭͕Α͘஌ΒΕ͍ͯΔΧʔωϧؔ਺Λ༻͍Δ 57
  50. Χʔωϧओ੒෼෼ੳɿਖ਼ఆ஋Χʔωϧͷܾఆ (2) ୅දతͳΧʔωϧؔ਺ ଟ߲ࣜΧʔωϧ k(x,x′) = (c + x⊺x′)d (c

    ≥ 0, d ∈ N) Ψ΢εʢRBFʣΧʔωϧ k(x,x′) = exp(−β∥x − x′∥2) (β > 0) ଟ߲ࣜΧʔωϧɿݩͷσʔλۭؒͷ d ࣍ଟ߲ࣜͰද͞ΕΔۭؒ Ψ΢εΧʔωϧɿແݶ࣍ݩʢʂʣͷۭؒ ΁ͷࣸ૾Λߟ͍͑ͯΔ͜ͱʹͳΔ 14 14௚ײతʹ͸ exp ͕ແݶ࣍ͷଟ߲ࣜʹల։͞ΕΔͨΊ 58
  51. Χʔωϧओ੒෼෼ੳɿਖ਼ఆ஋Χʔωϧͷܾఆ (3) Ψ΢εΧʔωϧͷྫ (x,x′ ∈ Rɼԣ࣠͸ x − x′) -4

    -2 0 2 4 0.00 0.25 0.50 0.75 1.00 Gauss kernel x - x' similarity beta = 1.0 beta = 0.3 beta = 0.2 Χʔωϧ PCA ͷ݁Ռ͸ϋΠύʔύϥϝʔλʹେ͖͘ґଘ 59
  52. Χʔωϧओ੒෼෼ੳɿਖ਼ఆ஋Χʔωϧͷܾఆ (4) Χʔωϧ PCA ͸ڭࢣͳֶ͠शͳͷͰަࠩݕূ͕Ͱ͖ͳ͍ ˠ Χʔωϧύϥϝʔλͷܾఆ͸Ұൠʹ͔ͳΓ೉͍͠ X ࣍ݩ࡟ݮ͕ڭࢣ͋Γֶशͷલॲཧͷ৔߹ɿަࠩݕূ X

    άϥϜߦྻͷ஋͕๫Εͳ͍Α͏ʹ͢Δɿmedian heuristics median heuristics Ψ΢εΧʔωϧͷύϥϝʔλ β Λ β = 1/median{∥xn − xm∥2∣n < m} ͱܾΊΔͱάϥϜߦྻͷ஋͕๫ΕͣɼͦΕͳΓʹྑ͘ͳΔ͜ͱ ͕ଟ͍ 60
  53. Χʔωϧओ੒෼෼ੳɿ࣮ફ (1) Boston Housing Ϙετϯͷ֤஍۠Ͱͷॅ୐Ձ֨ͷσʔληοτ X Ұਓ͋ͨΓͷ൜ࡑ཰ɼҰࢎԽ஠ૉೱ౓ɼฏۉ෦԰਺ͳͲ D = 12

    ࣍ݩͷಛ௃ྔ X ໨తม਺͸ॅ୐Ձ֨ͷΤϦΞ಺தԝ஋ X αϯϓϧ਺͸ N = 506 X ՄࢹԽͷͨΊಛ௃ྔΛ 2 ࣍ݩʹ࣍ݩ࡟ݮͯ͠ΈΔ 61
  54. Χʔωϧओ੒෼෼ੳɿ࣮ફ (2) -0.10 -0.05 0.00 0.05 -0.10 -0.05 0.00 0.05

    0.10 0.15 Linear PCA (Boston) -1.5 -1.0 -0.5 0 0.5 1.0 1.5 2.0 2.5 -60 -40 -20 0 20 0 20 40 60 Kernel PCA (Boston, Poly-2) -1.5 -1.0 -0.5 0 0.5 1.0 1.5 2.0 2.5 -0.4 -0.2 0.0 0.2 0.4 0.6 -0.25 0.00 0.25 0.50 Kernel PCA (Boston, Gaussian w/ med. heuristics) -1.5 -1.0 -0.5 0 0.5 1.0 1.5 2.0 2.5 2 ࣍ଟ߲ࣜΧʔωϧɿߴՁ֨ଳͷ αϯϓϧ͕௵Ε͍ͯΔ Ψ΢εΧʔωϧɿ෼཭͸Ͱ͖ͯ ͍Δ͕͙ʹΌ ͬͱ͍ͯ͠Δ 62
  55. Χʔωϧओ੒෼෼ੳɿ࣮ફ (3) MNIST ͷσʔλΛ 2000 ݸ࢖ͬͯ΍ͬͯΈΔ -0.075 -0.050 -0.025 0.000

    0.025 -0.050 -0.025 0.000 0.025 0.050 0.075 Linear PCA (MNIST) 0 1 2 3 4 5 6 7 8 9 -0.4 -0.2 0.0 0.2 0.4 -0.4 -0.2 0.0 0.2 Kernel PCA (MNIST, Gauss w/ med. heuristics) 0 1 2 3 4 5 6 7 8 9 “1” ΍ “9” ͸Χʔωϧ PCA ͷํ͕͏·͘ଊ͑ΒΕͯΔ 63
  56. Χʔωϧओ੒෼෼ੳɿ·ͱΊͱ஫ҙ ·ͱΊ X ෼ࢄ࠷େԽʹجͮ͘ PCA ͷ֦ுͱͯ͠Χʔωϧ PCA Λಋग़ X ݁Ռ͸Χʔωϧؔ਺΍ύϥϝʔλʹେ͖͘ґଘ

    X ڭࢣͳ͠ͳͷͰύϥϝʔλΛ͏·ܾ͘ΊΔͷ͸݁ߏ೉͍͠ X ࣍ݩ࡟ݮޙʹڭࢣ͋ΓֶशΛߦ͏ͳΒ CV ͕࢖͑Δ X Ψ΢εΧʔωϧͷύϥϝʔλΛܾΊΔ heuristics Λ঺հ ஫ҙ X ແ೉ͳͷ͸ઢܗ PCA X ඇઢܗ࣍ݩ࡟ݮ΁ͷڧ͍ಈػ͕ͳ͚Ε͹ɼύϥϝʔλϑϦʔ Ͱղऍ΋͠΍͍͢ઢܗ PCA Λ࢖͏΂͖ʢචऀॴײʣ X աֶशʹؾΛ͚ͭΔ X Ψ΢εΧʔωϧͳͲ͸දݱྗ͕ߴ͗͢ΔͷͰɼͱ͘ʹߴ࣍ݩ σʔλʹରͯ͠͸ʮσʔλ఺ͷ͋Δͱ͜Ζ͔͠஋Λ΋ͨͳ͍ʯ Α͏ͳ͜ͱʹͳΓ͕ͪ 64
  57. References i [1] Wang, H., Yan, S., Xu, D., Tang,

    X., and Huang, T. (2007), “Trace Ratio vs. Ratio Trace for Dimensionality Reduction,” in 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN: IEEE, pp. 18. [2] Petersen, K., and M. S. Pedersen. (2012) “The matrix cookbook,” Version November 15 2012. [3] ෱ਫ݈࣍. (2010) “Χʔωϧ๏ೖ໳ -ਖ਼ఆ஋ΧʔωϧʹΑΔ σʔλղੳ-,” ே૔ॻళ.