Upgrade to Pro — share decks privately, control downloads, hide ads and more …

機械学習における重要度重み付けとその応用

Masanari Kimura
November 13, 2023

 機械学習における重要度重み付けとその応用

Masanari Kimura

November 13, 2023
Tweet

More Decks by Masanari Kimura

Other Decks in Research

Transcript

  1. 機械学習における重要度重み付けとその応用
    Masanari Kimura
    Graduate University for Advanced Studies, SOKENDAI
    Department of Statistical Science, School of Multidisciplinary Sciences
    [email protected]
    November 13, 2023
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 1 / 80

    View full-size slide

  2. 目次
    1 概要
    2 種々の分布シフト
    3 ドメイン適応
    4 能動学習
    5 Distributionally Robust Optimization
    6 Model Calibration
    7 Positive-Unlabelled (PU) learning
    8 Label Noise Correction
    9 密度比推定
    10 重要度重み付けと深層学習
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 2 / 80

    View full-size slide

  3. 目次
    1 概要
    2 種々の分布シフト
    3 ドメイン適応
    4 能動学習
    5 Distributionally Robust Optimization
    6 Model Calibration
    7 Positive-Unlabelled (PU) learning
    8 Label Noise Correction
    9 密度比推定
    10 重要度重み付けと深層学習
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 3 / 80

    View full-size slide

  4. 概要
    重要度重み付けは何らかの重要度に基づいてインスタンスに重み付けする操作:
    e.g., for S = {xi}n
    i=1
    , f(S) :=
    x∈S
    ϕ(x) ⇒ fw(S) :=
    x∈S
    w(x)ϕ(x).
    機械学習において多くの応用範囲がある.
    どうやって w(x) を定義 or 推定するのかも大事.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 4 / 80

    View full-size slide

  5. Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 5 / 80

    View full-size slide

  6. 目次
    1 概要
    2 種々の分布シフト
    3 ドメイン適応
    4 能動学習
    5 Distributionally Robust Optimization
    6 Model Calibration
    7 Positive-Unlabelled (PU) learning
    8 Label Noise Correction
    9 密度比推定
    10 重要度重み付けと深層学習
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 6 / 80

    View full-size slide

  7. 種々の分布シフト
    重要度重み付けの主戦場の 1 つとして分布シフト適応がある.
    分布シフトは学習時とテスト時のデータの従う確率分布が異なるという設定.
    どの変数に注目するかなどによって以下のように分類できる:
    Covariate shift
    Target shift
    Sample selection bias
    Subpopulation shift
    Feedback shift
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 7 / 80

    View full-size slide

  8. Empirical Risk Minimization
    一般的な機械学習アルゴリズムは学習データとテストデータが同一の確率分布に従
    うことを仮定している(i.i.d. 仮定)

    特に,i.i.d. 仮定の下での教師あり学習の妥当性は Empirical Risk Minimization(ERM)
    の統計的性質に依存する(経験リスク ˆ
    R の最小化 ⇒ 期待リスク R の最小化)

    例:ERM の不偏性
    ERM はある損失関数 ℓ : Y × Y → [0, ∞) を経験的に得られたデータ集合 D に対して最小
    化することで,未知データに対する損失の最小化を目指す手続き.
    ˆ
    h = arg min
    h∈H
    ˆ
    R(h) = arg min
    h∈H
    1
    |D|
    (x,y)∈D
    ℓ(h(x), y). (1)
    学習分布 ptr
    とテスト分布 pte
    の同一性が仮定できるとき,ERM は不偏性を持つ.
    Eptr
    [ ˆ
    R] = R. (2)
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 8 / 80

    View full-size slide

  9. Covariate Shift
    ERM の仮定するデータの分布の同一性は,現実の問題では満たされないことが多い.
    Covariate Shift Assumption(共変量シフト仮定)は,学習時とテスト時の共変量が従
    う確率分布が異なるという仮定 [65].
    Covariate Shift Assumption
    学習分布 ptr
    ,テスト分布 pte
    について以下が成り立つ:
    ptr(x) ̸= pte(x),
    ptr(y|x) = pte(y|x).
    この仮定の下では,ERM の不偏性は満たされない.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 9 / 80

    View full-size slide

  10. Importance Weighted Empirical Risk Minimization
    損失関数の最小化の際に共変量の密度比 pte(x)/ptr(x) による重み付けを行うことで,
    ERM の普遍性が復元できる(IWERM [65])

    Eptr(x,y)
    pte(x)
    ptr(x)
    ℓ(h(x), y) =
    X×Y
    pte(x)
    ptr(x)
    ℓ(h(x), y) · ptr(x, y)dxdy
    =
    X×Y
    pte(x)
    ptr(x)
    ℓ(h(x), y) · ptr(x)ptr(y|x)dxdy
    =
    X×Y
    pte(x)ℓ(h(x), y) · ptr(y|x)dxdy
    =
    X×Y
    pte(x)ℓ(h(x), y) · pte(y|x)dxdy
    =
    X×Y
    ℓ(h(x), y) · pte(x, y)dxdy
    = Epte(x,y)
    [ℓ(h(x), y)] . (3)
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 10 / 80

    View full-size slide

  11. IWERM の亜種
    IWERM の数値実験上の不安定さを解消するため,いくつかの亜種が提案されて
    いる:
    Adaptive Importance Weighted ERM(AIWERM [65])

    ˆ
    h = arg min
    h∈H
    1
    |D|
    (x,y)∈D
    wA(x)ℓ(h(x), y), wA(x) =
    pte(x)
    ptr(x)
    λ
    . (4)
    Relative Importance Weighted ERM(RIWERM [86])

    ˆ
    h = arg min
    h∈H
    1
    |D|
    (x,y)∈D
    wR(x)ℓ(h(x), y), wR(x) =
    pte(x)
    (1 − λ)ptr(x) + λpte(x)
    . (5)
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 11 / 80

    View full-size slide

  12. IWERM の情報幾何学的一般化
    IWERM およびその亜種における重要度重みづけ w(x) の選択は,データの確率分布
    が構成する統計的多様体上の α-測地線の選択と同一視できる [38].
    ˆ
    h = arg min
    h∈H x∈D
    w(λ,α)(x)ℓ(h(x), y), w(λ,α)(x) =
    m(λ,α)
    f
    (ptr(x), pte(x))
    ptr(x)
    , (6)
    ここで
    m(λ,α)
    f
    (a, b) = f−1
    α
    (1 − λ)fα(a) + λfα(b) , fα(a) =
    a1−α
    2 (α ̸= 1)
    log a (α = 1).
    (7)
    IWERM は λ = 1 の場合.
    AIWERM は α = 1 の場合.
    RIWERM は α = 3 の場合.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 12 / 80

    View full-size slide

  13. Covariate Shift 下での重要度重みづけのその他の話題
    Importance Weighted Cross Validation (IWCV) [69] は共変量シフト下でのモデル選択
    のための Cross Validation の亜種.
    Distributionally Robust Optimization の文脈では,学習データに対する密度比
    pte(x)/ptr(x) による重み付けの代わりに,テストデータに対する逆密度比
    ptr(x)/pte(x) による重み付けが広く使われている.
    これら 2 種類の重み付けのトレードオフを考慮した Double-Weighting Covariate Shift
    Adaptation[49] が提案されている.
    共変量シフト下での conformal prediction にも重要度重みづけが有効であることが報
    告されている [72].
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 13 / 80

    View full-size slide

  14. Covariate Shift 下での重要度重みづけの Negative Results
    IWERM とその様々な推定量はどれも期待リスクを過小評価する [42].
    モデルがノンパラメトリックかつ model misspecification が仮定されない場合,重要
    度重み付けは不要 [26].
    ただしノンパラメトリックかつ model misspecification が仮定される場合は必要.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 14 / 80

    View full-size slide

  15. Target Shift
    ターゲットシフトは,学習時とテスト時のターゲット変数の分布の違いを仮定 [92].
    Target Shift Assumption
    学習分布 ptr
    ,テスト分布 pte
    について以下が成り立つ:
    ptr(y) ̸= pte(y),
    ptr(x|y) = pte(x|y).
    共変量シフト仮定と同様に,重要度重みづけが有効.
    EM アルゴリズムを用いた p(y) の推定は p(x|y) の推定を内包するため非効率 [15].
    ターゲット変数が連続値をとるとき,半教師あり学習の設定での密度比の推定による重
    要度重みづけが有効 [53].
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 15 / 80

    View full-size slide

  16. Black Box Shift Estimation
    Black Box Shift Estimation (BBSE) [45] はブラックボックス予測器を用いた重要度重
    み付け w の推定方法.
    ブラックボックス予測器 f の Confusion matrix C と,f の出力平均 b を用いて以下の
    式を解く:
    Cw = b. (8)
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 16 / 80

    View full-size slide

  17. Sample Selection Bias
    共変量シフトと同一の仮定を置くことも多い [75, 81, 78, 6].
    広く受け入れられている解法の一つはとして,sample selection bias をモデル化する
    ための確率変数 s を導入し,テスト分布を以下のように構成する(s = 1 のときイン
    スタンスが選択されることを表現)

    pte(x, y) = p(x, y) =
    s
    p(x, y, s). (9)
    このような仮定の下で,次のような重要度が有効であることが知られてい
    る [91, 76]:
    w(x) =
    P(s = 1)
    P(s = 1|x)
    . (10)
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 17 / 80

    View full-size slide

  18. Subpopulation Shift
    Subpopulation shift は,単一のインスタンスでなく,データの部分集合の分布の変化
    を仮定する [64, 87].
    他の分布シフト適応と同様に,部分集合の出現頻度に応じた重要度重みづけが有
    効 [13, 18, 46].
    より最近の研究としては,UMIX[30] は mixup データ拡張においてインスタンスの加
    重平均をとる操作に subpopulation shift のための重要度重み付けを導入している.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 18 / 80

    View full-size slide

  19. Feedback Shift
    広告分野において,購買行動に何度のクリックが紐づくかを予測することは重要.
    しかし実際の問題設定では,クリックから購買までは比較的長い時間がかかること
    が知られている(Feedback Shift,または Delayed Feedback)[16, 90, 43, 63, 71].
    Feedback Shift のもたらす悪影響としては,本来クリックの後購入されて positive ラ
    ベルが付くはずだったインスタンスが,フィードバックの遅れによって negative ラ
    ベルが付いてしまうこと.
    Feedback Shift Importance Weighting (FSIW)[88] は,フィードバックの遅延確率に応
    じた重要度重み付けによってこの問題に取り組んでいる.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 19 / 80

    View full-size slide

  20. 目次
    1 概要
    2 種々の分布シフト
    3 ドメイン適応
    4 能動学習
    5 Distributionally Robust Optimization
    6 Model Calibration
    7 Positive-Unlabelled (PU) learning
    8 Label Noise Correction
    9 密度比推定
    10 重要度重み付けと深層学習
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 20 / 80

    View full-size slide

  21. ドメイン適応
    ドメイン適応の目的は,与えられたソースドメインのデータを用いて,ターゲット
    ドメインのデータにおいて良い予測器を学習すること [57, 4, 79, 24, 17, 80].
    重要度重み付けによるドメイン適応についての多くの研究が存在:
    敵対学習において,generator が discriminator を騙すような重みをインスタンスに割り当
    てるように学習 [50].
    ソースドメインとターゲットドメインのサンプルサイズの違いを考慮した重要度重み付
    け [83].
    重要度重みづけに基づくドメイン適応の汎化誤差解析 [1].
    NLP タスクでも重要度重み付けによるドメイン適応が用いられている [36].ただし,単
    語の出現頻度といった NLP タスク特有の問題による重要度重み付けによるドメイン適
    応がうまくいかないことが報告されている [59].
    また,こうしたネガティブな結果は,既存研究はサンプル選択バイアスのみに注目して
    いてサンプル選択分散を扱えていないことが原因と指摘 [82].
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 21 / 80

    View full-size slide

  22. ドメイン適応のその他の問題設定
    ドメイン適応の問題は,データの与えられ方や条件によって細分化できる.
    Multi-source domain adaptation:ソースドメインが複数ある問題設定
    Partial domain adaptation:ターゲットドメインがソースドメインより少ないクラス
    数を持つ問題設定
    Open-set domain adaptation:両方のドメインに未知クラスが含まれる.
    Universal domain adaptation:ラベル集合に一切の事前知識を必要としない問題設定.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 22 / 80

    View full-size slide

  23. 目次
    1 概要
    2 種々の分布シフト
    3 ドメイン適応
    4 能動学習
    5 Distributionally Robust Optimization
    6 Model Calibration
    7 Positive-Unlabelled (PU) learning
    8 Label Noise Correction
    9 密度比推定
    10 重要度重み付けと深層学習
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 23 / 80

    View full-size slide

  24. 能動学習
    能動学習は,学習データに加えるインスタンスをうまく選べば,無作為に選んだも
    のと同じだけのサンプルサイズでより良いモデルを学習できるという仮定のもとで
    の問題設定.
    能動学習のインスタンス選択の戦略は以下のように大別できる:
    uncertainty
    diversity
    representativeness
    reducing expected error
    maximizing expected model changes
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 24 / 80

    View full-size slide

  25. Importance Weighted Active Learning
    Importance Weighted Active Learning (IWAL) [9] は,重要度重み付けに基づく能動学
    習の有名なものの一つ.
    IWAL はラベルなしインスタンス xt
    にその特徴量やそれまでのラベル付きデータの履歴
    などをもとに確率 pt
    でラベルづけを行う.
    その後,インスタンス xt
    の重みを 1/pt
    として学習を行う.pt
    の決め方は,時刻 t 時点
    のデータで学習されたモデルの集合 Ht
    を用いて
    pt = max
    f,g∈Ht+1
    max
    y
    σ(ℓ(f(xt), y) − ℓ(g(xt), y)), (11)
    Ht+1
    = {h ∈ Ht; Lt(h) ≤ L∗
    t
    + ∆t}, (12)
    IWAL はこの重要度重み付けの下で一致性を持つことが示されている.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 25 / 80

    View full-size slide

  26. IWAL の追加の議論
    Beygelzimer et al. は rejection threshold を適切に設定することで IWAL の実用的な実
    装を行なった [10].
    能動学習には sample reusability [73, 74] という概念がある.これは,ある学習器を用
    いた能動学習で集められたデータセットは他の学習器にとっても有用かを考える
    問題.
    検証の結果 IWAL は sample reusability を持たないという報告がある [77].
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 26 / 80

    View full-size slide

  27. 能動学習と model misspecification
    既存の能動学習は大きな model misspecification に脆弱であることを指摘し,重要度
    重みづけによってこの影響を緩和できることを示唆 [68].
    model misspecification の下での一般化線形モデルの能動学習の漸近的性質を調べ,効
    果的なインスタンス選択が重要度重み付けに依存することを提案 [2].
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 27 / 80

    View full-size slide

  28. Active Learning by Learning
    Active Learning by Learning (ALBL) [31] は複数のインスタンス選択の戦略を多腕バン
    ディットのフレームワークを用いて組み合わせる.
    ALBL は IWAL を拡張して,Importance Weighted Accuracy と呼ばれる報酬関数を導
    入し,実験的に良い結果を得られることを報告.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 28 / 80

    View full-size slide

  29. 能動学習の性能評価
    能動学習の課題の一つとして,sampling bias などに起因するデータ収集中のモデル
    の性能評価がある.
    能動学習においては,単純な重要度重み付け Cross Validation はうまくいかないことが
    報告されている [41].
    重要度重み付けと class balanced sampling[93] を組み合わせるとうまく評価できるらし
    い.
    loss-propotional sampling に基づいた active testing と呼ばれる手法も存在 [40, 25].
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 29 / 80

    View full-size slide

  30. 目次
    1 概要
    2 種々の分布シフト
    3 ドメイン適応
    4 能動学習
    5 Distributionally Robust Optimization
    6 Model Calibration
    7 Positive-Unlabelled (PU) learning
    8 Label Noise Correction
    9 密度比推定
    10 重要度重み付けと深層学習
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 30 / 80

    View full-size slide

  31. Distributionally Robust Optimization
    Distributionally Robust Optimization (DRO) [60, 27, 20, 8, 44, 19] は,ある分布 p0
    の周
    りの uncertainty set U(p0) におけるワーストケース性能を改善するような最適化を目
    指すタスク.
    minimizeh∈H R(h; p0) := sup
    q∈U(p0)
    E(x,y)∼q(x,y)
    ℓ(h(x), y) , (13)
    uncertainty set の構成に関しては多くの研究がある [5, 7, 11, 22].
    DRO は未知の分布シフトに対する最悪ケースの評価を考えていると捉えることもで
    きる.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 31 / 80

    View full-size slide

  32. DRO と重要度重み付け
    DRO の定式化の一つとして,既知のターゲットドメインなしに重要度重み付き損失
    を考えるものがある.
    ある重み付け関数の集合 W を用いて,以下のように uncertainty set を構成する.
    Uw(p0) = {w(x) · p0(x); w ∈ W}. (14)
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 32 / 80

    View full-size slide

  33. 目次
    1 概要
    2 種々の分布シフト
    3 ドメイン適応
    4 能動学習
    5 Distributionally Robust Optimization
    6 Model Calibration
    7 Positive-Unlabelled (PU) learning
    8 Label Noise Correction
    9 密度比推定
    10 重要度重み付けと深層学習
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 33 / 80

    View full-size slide

  34. Model Calibration
    機械学習モデルの softmax 出力の確率はモデルの信頼度として扱われることが多い.
    ただしそうした出力は well-calibrated であるとは限らない.
    model calibration は,実際のイベントの発生確率に即した出力をモデルに促すことを
    期待するタスク.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 34 / 80

    View full-size slide

  35. Focal Loss の重要度重み付け的解釈
    Model Calibration のために最もよく使われるものの一つが Focal Loss[51].
    Focal Loss は,簡単に分類できるインスタンスの重みを小さくすることで model
    calibration に取り組んでいる.
    この手続きは,モデルの予測確率 pi
    に依存した重み付け w(xi) = (1 − pi)γ を行なっ
    ていると解釈できる.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 35 / 80

    View full-size slide

  36. 目次
    1 概要
    2 種々の分布シフト
    3 ドメイン適応
    4 能動学習
    5 Distributionally Robust Optimization
    6 Model Calibration
    7 Positive-Unlabelled (PU) learning
    8 Label Noise Correction
    9 密度比推定
    10 重要度重み付けと深層学習
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 36 / 80

    View full-size slide

  37. Positive-Unlabelled (PU) learning
    Positive-Unlabeled learning (PU) Learning は,2 値分類において positive ラベルのつい
    たサンプルとラベルなしサンプルだけからモデルを学習するタスク [3, 39].
    PU Learning では negative ラベルのついたインスタンスが与えられない.
    例えば negative class がうまく定義できないような問題設定において有用.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 37 / 80

    View full-size slide

  38. Lemma (Elkan and Noto[21])
    Let x be an example and let y ∈ {0, 1} be a binary label. Let s = 1 if the example x is labeled,
    and let s = 0 if x is unlabeled. Then, for the selected completely at random unlabeled example
    x, we have
    p(y = 1|x) =
    p(s = 1|x)
    p(s = 1|y = 1)
    . (15)
    Proof.
    仮定から,p(s = 1|y = 1, x) = p(s = 1|y = 1).また,
    p(s = 1|x) = p(y = 1 ∧ s = 1|x)
    = p(y = 1|x)p(s = 1|y = 1, x)
    = p(y = 1|x)p(s = 1|y = 1). (16)
    両辺を p(s = 1|y = 1) で割ることで,補題を得る.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 38 / 80

    View full-size slide

  39. ラベルなしデータに対して,
    p(y = 1|x, s = 0) =
    p(s = 0|x, y = 1)p(y = 1|x)
    p(s = 0|x)
    =
    (1 − p(s = 1|x, y = 1)) p(y = 1|x)
    1 − p(s = 1|x)
    =
    (1 − c)p(y = 1|x)
    1 − p(s = 1|x)
    =
    (1 − c)p(s = 1|x)/c
    1 − p(s = 1|x)
    =
    1 − c
    c
    p(s = 1|x)
    1 − p(s = 1|x)
    . (17)
    ここで c = p(s = 1|y = 1). よって,
    Ep(x,y,s)
    [h(x, y)] =
    X×Y×S
    h(x, y)p(x, y, x)dxdyds
    =
    X
    p(x) p(s = 1|x)h(x, 1) + p(s = 0|x) p(y = 1|x, s = 0)h(x, 1) + p(y = 0|x, s = 0)h(x, 0)
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 39 / 80

    View full-size slide

  40. このとき Ep(x,y,s)
    [h(x, y)] の plugin estimator は
    1
    ntr


    x,s=1
    h(x, 1) +
    (x,s=0)
    w(x)h(x, 1) + (1 − w(x))h(x, 0)

     . (18)
    ここで
    w(x) = p(y = 1|x, s = 0)
    =
    1 − c
    c
    p(s = 1|x)
    1 − p(s = 1|x)
    . (19)
    したがって,PU learning はラベルなしデータにラベルがつく確率に基づく重要度重み付
    けと捉えることができる.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 40 / 80

    View full-size slide

  41. 目次
    1 概要
    2 種々の分布シフト
    3 ドメイン適応
    4 能動学習
    5 Distributionally Robust Optimization
    6 Model Calibration
    7 Positive-Unlabelled (PU) learning
    8 Label Noise Correction
    9 密度比推定
    10 重要度重み付けと深層学習
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 41 / 80

    View full-size slide

  42. Label Noise Correction
    Label noise correction [55, 56, 66] は,ラベル付き学習データセットの不正確なラベル
    付けの検出および修正を行うタスク.
    最も有名な label noise correction のアプローチは,ラベルノイズの発生確率に対応する
    行列を用いて損失関数に重み付けするもの [58].
    この手法は,ラベルノイズの発生確率が高いインスタンスの重みを小さくするように学
    習する重要度重み付き学習に相当する.
    Noise Attention Learning [47] はラベルノイズの発生確率をアテンション構造を用いてモ
    デル化し,得られたノイズ確率を用いて重み付き学習を行う.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 42 / 80

    View full-size slide

  43. 目次
    1 概要
    2 種々の分布シフト
    3 ドメイン適応
    4 能動学習
    5 Distributionally Robust Optimization
    6 Model Calibration
    7 Positive-Unlabelled (PU) learning
    8 Label Noise Correction
    9 密度比推定
    10 重要度重み付けと深層学習
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 43 / 80

    View full-size slide

  44. 密度比推定
    密度比推定は,学習分布とテスト分布の密度比 r(x) = pte(x)/ptr(x) を推定する問題.
    r(x) をよく推定できれば,IWERM に活用できるため非常に大事なタスク.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 44 / 80

    View full-size slide

  45. Moment Matching に基づく密度比推定
    Moment Matching の基本的なアイディアは,重み付き分布 ˆ
    pte(x) = ˆ
    r(x)ptr(x) とテス
    ト分布 pte(x) のマッチング.
    よく用いられるのはこれらの分布の平均のマッチング.
    x ·ˆ
    r(x)ptr(x)dx = x · pte(x)dx. (20)
    ただし,有限個のモーメントのマッチングは,漸近的にであってさえも真の密度比
    は誘導しないことが知られている.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 45 / 80

    View full-size slide

  46. Kernel Mean Matching (KMM)
    Kernel Mean Matching [32, 28] は,再生核ヒルベルト空間上 H で,
    min
    ˆ
    r∈H
    K(x, ·)ˆ
    r(x)ptr(x)dx − K(x, ·)ptr(x)dx
    2
    H
    , (21)
    として Moment Matching を行う.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 46 / 80

    View full-size slide

  47. KLIEP と LSIF
    KLIEP [54, 70] は pte(x) と ˆ
    pte(x) = ˆ
    r(x)ptr(x) の間の KL-divergence の最小化によって
    密度比を推定する.
    min
    ˆ
    r
    DKL[pte(x)∥ˆ
    pte] = min
    ˆ
    r
    pte(x)
    pte(x)
    ˆ
    r(x)ptr(x)
    dx.
    同様に Least-Squares Importance Fitting (LSIF)[37] は squared loss を最小化する.
    min
    ˆ
    r

    r(x) − r(x))2 ptr(x)dx.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 47 / 80

    View full-size slide

  48. Telescoping Density Ratio Estimation (TRE)
    ふたつの分布間が大きく離れているとき,密度比推定の性能は大きく劣化する.
    Telescoping Density Ratio Estimation (TRE)[62] は,ふたつの分布の間の中間データセ
    ットを生成し,データを徐々にソース分布 p0
    からターゲット分布 q = pm
    へ移すこ
    とを提案.
    p0(x)
    pm(x)
    =
    p0(x)p1(x)
    p1(x)p2(x)
    · · ·
    pm−2(x)pm−1(x)
    pm−1(x)pm(x)
    . (22)
    後の研究 [89] では,TRE のいくつかの統計的性質が明らかになっている.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 48 / 80

    View full-size slide

  49. 目次
    1 概要
    2 種々の分布シフト
    3 ドメイン適応
    4 能動学習
    5 Distributionally Robust Optimization
    6 Model Calibration
    7 Positive-Unlabelled (PU) learning
    8 Label Noise Correction
    9 密度比推定
    10 重要度重み付けと深層学習
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 49 / 80

    View full-size slide

  50. 重要度重み付けと深層学習
    Over-parametrized なニューラルネットワークにおける重要度重み付き ERM の振る舞
    いはこれまで未解明だった.
    近年の研究では,深層学習における重要度重み付けの効果は学習イテレーションに
    応じて減衰することが報告されている [12].
    また,こうした現象は L2 正則化およびバッチ正規化によって緩和できることも実験的
    に示されている.
    この現象は勾配法の implicit bias と関連することも示唆されている [67, 35, 34, 52].
    後続の研究では,こうした実験結果の理論的裏付けも与えられている [84].
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 50 / 80

    View full-size slide

  51. 重要度重み付けの深層学習による近似
    重要度重み付け関数 w(x) のニューラルネットワークによる暗黙的な学習に関する研究も
    多く存在:
    メタ学習による重要度重み付けの学習 [61].
    重要度重み付け関数の NN による学習はバイアスを誘導することを指摘 [23].分類器
    と重みづけ関数の交互最適化によってこれを解決できることを提案.
    重要度重み付けは敵対的攻撃への頑健性を向上できることも報告されている
    [14, 85, 33, 29].また,このような重みは敵対的学習によって獲得できることも提案.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 51 / 80

    View full-size slide

  52. Importance Tempering
    重要度重み付けの代替として,Importance Tempering[48] が提案されている.
    Importance Tempering は over-parametrized なニューラルネットワークの決定境界の改
    善が目的.
    重要度重み付けに相当するインスタンス依存の温度パラメータを softmax 関数に導
    入することでこれを達成.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 52 / 80

    View full-size slide

  53. References I
    [1] Kamyar Azizzadenesheli, Anqi Liu, Fanny Yang, and Animashree Anandkumar.
    Regularized learning for domain adaptation under label shifts.
    arXiv preprint arXiv:1903.09734, 2019.
    [2] Francis Bach.
    Active learning for misspecified generalized linear models.
    Advances in neural information processing systems, 19, 2006.
    [3] Jessa Bekker and Jesse Davis.
    Learning from positive and unlabeled data: A survey.
    Machine Learning, 109:719–760, 2020.
    [4] Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira.
    Analysis of representations for domain adaptation.
    Advances in neural information processing systems, 19, 2006.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 53 / 80

    View full-size slide

  54. References II
    [5] Aharon Ben-Tal, Dick Den Hertog, Anja De Waegenaere, Bertrand Melenberg, and Gijs
    Rennen.
    Robust solutions of optimization problems affected by uncertain probabilities.
    Management Science, 59(2):341–357, 2013.
    [6] Richard A Berk.
    An introduction to sample selection bias in sociological data.
    American sociological review, pages 386–398, 1983.
    [7] Dimitris Bertsimas, Vishal Gupta, and Nathan Kallus.
    Data-driven robust optimization.
    Mathematical Programming, 167:235–292, 2018.
    [8] Dimitris Bertsimas, Melvyn Sim, and Meilin Zhang.
    Adaptive distributionally robust optimization.
    Management Science, 65(2):604–618, 2019.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 54 / 80

    View full-size slide

  55. References III
    [9] Alina Beygelzimer, Sanjoy Dasgupta, and John Langford.
    Importance weighted active learning.
    In Proceedings of the 26th annual international conference on machine learning, pages
    49–56, 2009.
    [10] Alina Beygelzimer, Daniel Hsu, Nikos Karampatziakis, John Langford, and Tong Zhang.
    Efficient active learning.
    In ICML 2011 Workshop on On-line Trading of Exploration and Exploitation, 2011.
    [11] Jose Blanchet, Yang Kang, and Karthyek Murthy.
    Robust wasserstein profile inference and applications to machine learning.
    Journal of Applied Probability, 56(3):830–857, 2019.
    [12] Jonathon Byrd and Zachary Lipton.
    What is the effect of importance weighting in deep learning?
    In International conference on machine learning, pages 872–881. PMLR, 2019.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 55 / 80

    View full-size slide

  56. References IV
    [13] Zhangjie Cao, Kaichao You, Mingsheng Long, Jianmin Wang, and Qiang Yang.
    Learning to transfer examples for partial domain adaptation.
    In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
    pages 2985–2994, 2019.
    [14] Anirban Chakraborty, Manaar Alam, Vishal Dey, Anupam Chattopadhyay, and Debdeep
    Mukhopadhyay.
    A survey on adversarial attacks and defences.
    CAAI Transactions on Intelligence Technology, 6(1):25–45, 2021.
    [15] Yee Seng Chan and Hwee Tou Ng.
    Word sense disambiguation with distribution estimation.
    In IJCAI, volume 5, pages 1010–5, 2005.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 56 / 80

    View full-size slide

  57. References V
    [16] Olivier Chapelle.
    Modeling delayed feedback in display advertising.
    In Proceedings of the 20th ACM SIGKDD international conference on Knowledge
    discovery and data mining, pages 1097–1105, 2014.
    [17] Gabriela Csurka.
    Domain adaptation for visual applications: A comprehensive survey.
    arXiv preprint arXiv:1702.05374, 2017.
    [18] Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie.
    Class-balanced loss based on effective number of samples.
    In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
    pages 9268–9277, 2019.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 57 / 80

    View full-size slide

  58. References VI
    [19] Erick Delage and Yinyu Ye.
    Distributionally robust optimization under moment uncertainty with application to
    data-driven problems.
    Operations research, 58(3):595–612, 2010.
    [20] John Duchi and Hongseok Namkoong.
    Learning models with uniform performance via distributionally robust optimization.
    arXiv preprint arXiv:1810.08750, 2018.
    [21] Charles Elkan and Keith Noto.
    Learning classifiers from only positive and unlabeled data.
    In Proceedings of the 14th ACM SIGKDD international conference on Knowledge
    discovery and data mining, pages 213–220, 2008.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 58 / 80

    View full-size slide

  59. References VII
    [22] Peyman Mohajerin Esfahani and Daniel Kuhn.
    Data-driven distributionally robust optimization using the wasserstein metric:
    Performance guarantees and tractable reformulations.
    arXiv preprint arXiv:1505.05116, 2015.
    [23] Tongtong Fang, Nan Lu, Gang Niu, and Masashi Sugiyama.
    Rethinking importance weighting for deep learning under distribution shift.
    Advances in neural information processing systems, 33:11996–12007, 2020.
    [24] Abolfazl Farahani, Sahar Voghoei, Khaled Rasheed, and Hamid R Arabnia.
    A brief review of domain adaptation.
    Advances in data science and information engineering: proceedings from ICDATA 2020
    and IKE 2020, pages 877–894, 2021.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 59 / 80

    View full-size slide

  60. References VIII
    [25] Sebastian Farquhar, Yarin Gal, and Tom Rainforth.
    On statistical bias in active learning: How and when to fix it.
    arXiv preprint arXiv:2101.11665, 2021.
    [26] Davit Gogolashvili, Matteo Zecchin, Motonobu Kanagawa, Marios Kountouris, and
    Maurizio Filippone.
    When is importance weighting correction needed for covariate shift adaptation?
    arXiv preprint arXiv:2303.04020, 2023.
    [27] Joel Goh and Melvyn Sim.
    Distributionally robust optimization and its tractable approximations.
    Operations research, 58(4-part-1):902–917, 2010.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 60 / 80

    View full-size slide

  61. References IX
    [28] Arthur Gretton, Alex Smola, Jiayuan Huang, Marcel Schmittfull, Karsten Borgwardt, and
    Bernhard Schölkopf.
    Covariate shift by kernel mean matching.
    Dataset shift in machine learning, 3(4):5, 2009.
    [29] Chuan Guo, Jacob Gardner, Yurong You, Andrew Gordon Wilson, and Kilian Weinberger.
    Simple black-box adversarial attacks.
    In International Conference on Machine Learning, pages 2484–2493. PMLR, 2019.
    [30] Zongbo Han, Zhipeng Liang, Fan Yang, Liu Liu, Lanqing Li, Yatao Bian, Peilin Zhao,
    Bingzhe Wu, Changqing Zhang, and Jianhua Yao.
    Umix: Improving importance weighting for subpopulation shift via uncertainty-aware
    mixup.
    Advances in Neural Information Processing Systems, 35:37704–37718, 2022.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 61 / 80

    View full-size slide

  62. References X
    [31] Wei-Ning Hsu and Hsuan-Tien Lin.
    Active learning by learning.
    In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015.
    [32] Jiayuan Huang, Arthur Gretton, Karsten Borgwardt, Bernhard Schölkopf, and Alex
    Smola.
    Correcting sample selection bias by unlabeled data.
    Advances in neural information processing systems, 19, 2006.
    [33] Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, and Pieter Abbeel.
    Adversarial attacks on neural network policies.
    arXiv preprint arXiv:1702.02284, 2017.
    [34] Ziwei Ji and Matus Telgarsky.
    Gradient descent aligns the layers of deep linear networks.
    arXiv preprint arXiv:1810.02032, 2018.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 62 / 80

    View full-size slide

  63. References XI
    [35] Ziwei Ji and Matus Telgarsky.
    Risk and parameter convergence of logistic regression.
    arXiv preprint arXiv:1803.07300, 2018.
    [36] Jing Jiang and ChengXiang Zhai.
    Instance weighting for domain adaptation in nlp.
    ACL, 2007.
    [37] Takafumi Kanamori, Shohei Hido, and Masashi Sugiyama.
    A least-squares approach to direct importance estimation.
    The Journal of Machine Learning Research, 10:1391–1445, 2009.
    [38] Masanari Kimura and Hideitsu Hino.
    Information geometrically generalized covariate shift adaptation.
    Neural Computation, 34(9):1944–1977, 2022.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 63 / 80

    View full-size slide

  64. References XII
    [39] Ryuichi Kiryo, Gang Niu, Marthinus C Du Plessis, and Masashi Sugiyama.
    Positive-unlabeled learning with non-negative risk estimator.
    Advances in neural information processing systems, 30, 2017.
    [40] Jannik Kossen, Sebastian Farquhar, Yarin Gal, and Tom Rainforth.
    Active testing: Sample-efficient model evaluation.
    In International Conference on Machine Learning, pages 5753–5763. PMLR, 2021.
    [41] Daniel Kottke, Jim Schellinger, Denis Huseljic, and Bernhard Sick.
    Limitations of assessing active learning performance at runtime.
    arXiv preprint arXiv:1901.10338, 2019.
    [42] Wouter M Kouw and Marco Loog.
    On regularization parameter estimation under covariate shift.
    In 2016 23rd International Conference on Pattern Recognition (ICPR), pages 426–431.
    IEEE, 2016.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 64 / 80

    View full-size slide

  65. References XIII
    [43] Sofia Ira Ktena, Alykhan Tejani, Lucas Theis, Pranay Kumar Myana, Deepak Dilipkumar,
    Ferenc Huszár, Steven Yoo, and Wenzhe Shi.
    Addressing delayed feedback for continuous training with neural networks in ctr
    prediction.
    In Proceedings of the 13th ACM conference on recommender systems, pages 187–195,
    2019.
    [44] Daniel Levy, Yair Carmon, John C Duchi, and Aaron Sidford.
    Large-scale methods for distributionally robust optimization.
    Advances in Neural Information Processing Systems, 33:8847–8860, 2020.
    [45] Zachary Lipton, Yu-Xiang Wang, and Alexander Smola.
    Detecting and correcting for label shift with black box predictors.
    In International conference on machine learning, pages 3122–3130. PMLR, 2018.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 65 / 80

    View full-size slide

  66. References XIV
    [46] Wei Liu and Sanjay Chawla.
    Class confidence weighted k nn algorithms for imbalanced data sets.
    In Advances in Knowledge Discovery and Data Mining: 15th Pacific-Asia Conference,
    PAKDD 2011, Shenzhen, China, May 24-27, 2011, Proceedings, Part II 15, pages
    345–356. Springer, 2011.
    [47] Yangdi Lu, Yang Bo, and Wenbo He.
    Noise attention learning: Enhancing noise robustness by gradient scaling.
    Advances in Neural Information Processing Systems, 35:23164–23177, 2022.
    [48] Yiping Lu, Wenlong Ji, Zachary Izzo, and Lexing Ying.
    Importance tempering: Group robustness for overparameterized models.
    arXiv preprint arXiv:2209.08745, 2022.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 66 / 80

    View full-size slide

  67. References XV
    [49] Jose Ignacio Segovia Martin, Santiago Mazuelas, and Anqi Liu.
    Double-weighting for covariate shift adaptation.
    In International Conference on Machine Learning, pages 30439–30457. PMLR, 2023.
    [50] Nima Mashayekhi.
    An Adversarial Approach to Importance Weighting for Domain Adaptation.
    PhD thesis, 2022.
    [51] Jishnu Mukhoti, Viveka Kulharia, Amartya Sanyal, Stuart Golodetz, Philip Torr, and
    Puneet Dokania.
    Calibrating deep neural networks using focal loss.
    Advances in Neural Information Processing Systems, 33:15288–15299, 2020.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 67 / 80

    View full-size slide

  68. References XVI
    [52] Mor Shpigel Nacson, Suriya Gunasekar, Jason Lee, Nathan Srebro, and Daniel Soudry.
    Lexicographic and depth-sensitive margins in homogeneous and non-homogeneous deep
    models.
    In International Conference on Machine Learning, pages 4683–4692. PMLR, 2019.
    [53] Tuan Duong Nguyen, Marthinus Christoffel, and Masashi Sugiyama.
    Continuous target shift adaptation in supervised learning.
    In Asian Conference on Machine Learning, pages 285–300. PMLR, 2016.
    [54] XuanLong Nguyen, Martin J Wainwright, and Michael I Jordan.
    Estimating divergence functionals and the likelihood ratio by convex risk minimization.
    IEEE Transactions on Information Theory, 56(11):5847–5861, 2010.
    [55] Bryce Nicholson, Victor S Sheng, and Jing Zhang.
    Label noise correction and application in crowdsourcing.
    Expert Systems with Applications, 66:149–162, 2016.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 68 / 80

    View full-size slide

  69. References XVII
    [56] Bryce Nicholson, Jing Zhang, Victor S Sheng, and Zhiheng Wang.
    Label noise correction methods.
    In 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA),
    pages 1–9. IEEE, 2015.
    [57] Vishal M Patel, Raghuraman Gopalan, Ruonan Li, and Rama Chellappa.
    Visual domain adaptation: A survey of recent advances.
    IEEE signal processing magazine, 32(3):53–69, 2015.
    [58] Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu.
    Making deep neural networks robust to label noise: A loss correction approach.
    In Proceedings of the IEEE conference on computer vision and pattern recognition, pages
    1944–1952, 2017.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 69 / 80

    View full-size slide

  70. References XVIII
    [59] Barbara Plank, Anders Johannsen, and Anders Søgaard.
    Importance weighting and unsupervised domain adaptation of pos taggers: a negative
    result.
    In Proceedings of the 2014 Conference on Empirical Methods in Natural Language
    Processing (EMNLP), pages 968–973, 2014.
    [60] Hamed Rahimian and Sanjay Mehrotra.
    Distributionally robust optimization: A review.
    arXiv preprint arXiv:1908.05659, 2019.
    [61] Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun.
    Learning to reweight examples for robust deep learning.
    In International conference on machine learning, pages 4334–4343. PMLR, 2018.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 70 / 80

    View full-size slide

  71. References XIX
    [62] Benjamin Rhodes, Kai Xu, and Michael U Gutmann.
    Telescoping density-ratio estimation.
    Advances in neural information processing systems, 33:4905–4916, 2020.
    [63] Abdollah Safari, Rachel MacKay Altman, and Thomas M Loughin.
    Display advertising: Estimating conversion probability efficiently.
    arXiv preprint arXiv:1710.08583, 2017.
    [64] Shibani Santurkar, Dimitris Tsipras, and Aleksander Madry.
    Breeds: Benchmarks for subpopulation shift.
    arXiv preprint arXiv:2008.04859, 2020.
    [65] Hidetoshi Shimodaira.
    Improving predictive inference under covariate shift by weighting the log-likelihood
    function.
    Journal of statistical planning and inference, 90(2):227–244, 2000.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 71 / 80

    View full-size slide

  72. References XX
    [66] Hwanjun Song, Minseok Kim, Dongmin Park, Yooju Shin, and Jae-Gil Lee.
    Learning from noisy labels with deep neural networks: A survey.
    IEEE Transactions on Neural Networks and Learning Systems, 2022.
    [67] Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, and Nathan Srebro.
    The implicit bias of gradient descent on separable data.
    The Journal of Machine Learning Research, 19(1):2822–2878, 2018.
    [68] Masashi Sugiyama.
    Active learning for misspecified models.
    Advances in neural information processing systems, 18, 2005.
    [69] Masashi Sugiyama, Matthias Krauledat, and Klaus-Robert Müller.
    Covariate shift adaptation by importance weighted cross validation.
    Journal of Machine Learning Research, 8(5), 2007.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 72 / 80

    View full-size slide

  73. References XXI
    [70] Masashi Sugiyama, Shinichi Nakajima, Hisashi Kashima, Paul Buenau, and Motoaki
    Kawanabe.
    Direct importance estimation with model selection and its application to covariate shift
    adaptation.
    Advances in neural information processing systems, 20, 2007.
    [71] Marcelo Tallis and Pranjul Yadav.
    Reacting to variations in product demand: An application for conversion rate (cr)
    prediction in sponsored search.
    In 2018 IEEE International Conference on Big Data (Big Data), pages 1856–1864. IEEE,
    2018.
    [72] Ryan J Tibshirani, Rina Foygel Barber, Emmanuel Candes, and Aaditya Ramdas.
    Conformal prediction under covariate shift.
    Advances in neural information processing systems, 32, 2019.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 73 / 80

    View full-size slide

  74. References XXII
    [73] Katrin Tomanek.
    Resource-aware annotation through active learning.
    2010.
    [74] Katrin Tomanek and Katherina Morik.
    Inspecting sample reusability for active learning.
    In Active Learning and Experimental Design workshop In conjunction with AISTATS
    2010, pages 169–181. JMLR Workshop and Conference Proceedings, 2011.
    [75] Van-Tinh Tran.
    Selection bias correction in supervised learning with importance weight.
    PhD thesis, Université de Lyon, 2017.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 74 / 80

    View full-size slide

  75. References XXIII
    [76] Van-Tinh Tran and Alex Aussem.
    Correcting a class of complete selection bias with external data based on importance
    weight estimation.
    In International Conference on Neural Information Processing, pages 111–118. Springer,
    2015.
    [77] Gijs Van Tulder.
    Sample reusability in importance-weighted active learning.
    2012.
    [78] Francis Vella.
    Estimating models with sample selection bias: a survey.
    Journal of Human Resources, pages 127–169, 1998.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 75 / 80

    View full-size slide

  76. References XXIV
    [79] Mei Wang and Weihong Deng.
    Deep visual domain adaptation: A survey.
    Neurocomputing, 312:135–153, 2018.
    [80] Garrett Wilson and Diane J Cook.
    A survey of unsupervised deep domain adaptation.
    ACM Transactions on Intelligent Systems and Technology (TIST), 11(5):1–46, 2020.
    [81] Christopher Winship and Robert D Mare.
    Models for sample selection bias.
    Annual review of sociology, 18(1):327–350, 1992.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 76 / 80

    View full-size slide

  77. References XXV
    [82] Rui Xia, Zhenchun Pan, and Feng Xu.
    Instance weighting for domain adaptation via trading off sample selection bias and
    variance.
    In Proceedings of the 27th International Joint Conference on Artificial Intelligence,
    Stockholm, Sweden, pages 13–19, 2018.
    [83] Ni Xiao and Lei Zhang.
    Dynamic weighted learning for unsupervised domain adaptation.
    In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
    pages 15242–15251, 2021.
    [84] Da Xu, Yuting Ye, and Chuanwei Ruan.
    Understanding the role of importance weighting for deep learning.
    arXiv preprint arXiv:2103.15209, 2021.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 77 / 80

    View full-size slide

  78. References XXVI
    [85] Han Xu, Yao Ma, Hao-Chen Liu, Debayan Deb, Hui Liu, Ji-Liang Tang, and Anil K Jain.
    Adversarial attacks and defenses in images, graphs and text: A review.
    International Journal of Automation and Computing, 17:151–178, 2020.
    [86] Makoto Yamada, Taiji Suzuki, Takafumi Kanamori, Hirotaka Hachiya, and Masashi
    Sugiyama.
    Relative density-ratio estimation for robust distribution comparison.
    Neural computation, 25(5):1324–1370, 2013.
    [87] Yuzhe Yang, Haoran Zhang, Dina Katabi, and Marzyeh Ghassemi.
    Change is hard: A closer look at subpopulation shift.
    arXiv preprint arXiv:2302.12254, 2023.
    [88] Shota Yasui, Gota Morishita, Fujita Komei, and Masashi Shibata.
    A feedback shift correction in predicting conversion rates under delayed feedback.
    In Proceedings of The Web Conference 2020, pages 2740–2746, 2020.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 78 / 80

    View full-size slide

  79. References XXVII
    [89] Jiayang Yin.
    On the Improvement of Density Ratio Estimation via Probabilistic Classifier–Theoretical
    Study and Its Applications.
    PhD thesis, The University of British Columbia (Vancouver, 2023.
    [90] Yuya Yoshikawa and Yusaku Imai.
    A nonparametric delayed feedback model for conversion rate prediction.
    arXiv preprint arXiv:1802.00255, 2018.
    [91] Bianca Zadrozny.
    Learning and evaluating classifiers under sample selection bias.
    In Proceedings of the twenty-first international conference on Machine learning, page
    114, 2004.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 79 / 80

    View full-size slide

  80. References XXVIII
    [92] Kun Zhang, Bernhard Schölkopf, Krikamol Muandet, and Zhikun Wang.
    Domain adaptation under target and conditional shift.
    In International Conference on Machine Learning, pages 819–827. PMLR, 2013.
    [93] Eric Zhao, Anqi Liu, Animashree Anandkumar, and Yisong Yue.
    Active learning under label shift.
    In International Conference on Artificial Intelligence and Statistics, pages 3412–3420.
    PMLR, 2021.
    Masanari Kimura (SOKENDAI) Importance Weighting and its Applications November 13, 2023 80 / 80

    View full-size slide