Upgrade to Pro — share decks privately, control downloads, hide ads and more …

20200906_ACL2020_metric_for_ordinal_classificat...

yoppe
September 06, 2020

 20200906_ACL2020_metric_for_ordinal_classification_YoheiKikuta

yoppe

September 06, 2020
Tweet

More Decks by yoppe

Other Decks in Research

Transcript

  1. 2020/09/06 @yohei_kikuta An Effectiveness Metric for Ordinal Classification: Formal Properties

    and Experimental Results Enrique Amigó UNED Madrid, Spain [email protected] Julio Gonzalo UNED Madrid, Spain [email protected] Stefano Mizzaro University of Udine Udine, Italy [email protected] Jorge Carrillo-de-Albornoz UNED Madrid, Spain [email protected] Abstract In Ordinal Classification tasks, items have to be assigned to classes that have a relative order- ing, such as positive, neutral, negative in sen- timent analysis. Remarkably, the most popu- lar evaluation metrics for ordinal classification tasks either ignore relevant information (for in- those other problems. But classification measures ignore the ordering between classes, ranking met- rics ignore category matching, and value prediction metrics are used by assuming (usually equal) nu- meric intervals between categories. In this paper we propose a metric designed to evaluate Ordinal Classification systems which re-
  2. /21 Ubie גࣜձࣾͷ٠ాངฏͰ͢ • Accounts • https://github.com/yoheikikuta • https://twitter.com/yohei_kikuta •

    https://yoheikikuta.github.io/ • WE ARE HIRING!!! • https://herp.careers/v1/ubie 2
  3. /21 ঺հ࿦จ An Effectiveness Metric for Ordinal Classification: Formal Properties

    and Experimental Results • https://www.aclweb.org/anthology/2020.acl-main.363/ • ϝϞ: https://github.com/yoheikikuta/paper-reading/issues/54 • ॱং෼ྨλεΫͷධՁࢦඪΛߟྀ͢΂͖ੑ࣭ʹج͍ͮͯఏҊ • ߟྀ͢΂͖ੑ࣭͸ Ordinal Invariance, Ordinal Monotonicity, Class Imbalance • ্ه 3 ͭΛຬͨ͢Ϋϥεॱং΍෼෍ʹج͍ͮͨࢦඪΛఆٛ • ࣮ݧʹΑΓैདྷ࢖ΘΕ͍ͯͨࢦඪΑΓ΋༗༻ͱ͍͏݁ՌΛಘͨ 3
  4. /21 Ͳ͕ͬͪʮѱ͍ʯʁ ࿦จͷࠪಡͰ {reject, weak reject, undecided, weak accept, accept}

    Λ͚ͭΔ͜ͱΛߟ͑Δ ࠪಡ࿦จͷਅͷධՁ஋͕ accept ͷͱ͖ʹҎԼͲ͕ͬͪѱ͍ʁ • weak reject ͱؒҧ͑Δ
 • weak accept ͱؒҧ͑Δ 4
  5. /21 Ͳ͕ͬͪʮѱ͍ʯʁ ࿦จͷࠪಡͰ {reject, weak reject, undecided, weak accept, accept}

    Λ͚ͭΔ͜ͱΛߟ͑Δ ࠪಡ࿦จͷਅͷධՁ஋͕ accept ͷͱ͖ʹҎԼͲ͕ͬͪѱ͍ʁ • weak reject ͱؒҧ͑Δ
 • weak accept ͱؒҧ͑Δ 5 ѱ͍
 ʢ3 ஈ֊΋ԼͷΧςΰϦʣ ·ͩϚγ
 ʢ1 ஈ֊͚ͩԼͷΧςΰϦʣ
  6. /21 ʢͦͷ̎ʣͲ͕ͬͪʮѱ͍ʯʁ weak reject 㲗 weak accept Λؒҧ͑ͯධՁ͢Δͱ͖ʹ
 ҎԼͲ͕ͬͪѱ͍ʁ 6

    7 105 193 90 7 0 50 100 150 200 reject weak reject undecided weak accept accept # papers 180 10 3 10 173 0 50 100 150 200 reject weak reject undecided weak accept accept # papers Figure 1: In the left distribution, weak accept vs. weak reject would be a strong disagreement between reviewers (i.e., the classes are distant), because in practice these are almost the extreme cases of the scale (reviewers rarely go for accept or reject). In the right distribution the situation is the opposite: reviewers tend to take a clear stance, which makes weak accept and weak reject closer assessments than in the left case. ਤ͸ https://www.aclweb.org/anthology/2020.acl-main.363/ ΑΓҾ༻
  7. /21 ʢͦͷ̎ʣͲ͕ͬͪʮѱ͍ʯʁ weak reject 㲗 weak accept Λؒҧ͑ͯධՁ͢Δͱ͖ʹ
 ҎԼͲ͕ͬͪѱ͍ʁ 7

    7 105 193 90 7 0 50 100 150 200 reject weak reject undecided weak accept accept # papers 180 10 3 10 173 0 50 100 150 200 reject weak reject undecided weak accept accept # papers Figure 1: In the left distribution, weak accept vs. weak reject would be a strong disagreement between reviewers (i.e., the classes are distant), because in practice these are almost the extreme cases of the scale (reviewers rarely go for accept or reject). In the right distribution the situation is the opposite: reviewers tend to take a clear stance, which makes weak accept and weak reject closer assessments than in the left case. ਤ͸ https://www.aclweb.org/anthology/2020.acl-main.363/ ΑΓҾ༻ ·ͩϚγ
 ʢͲͬͪͰ΋େࠩͳ͍ʣ ѱ͍
 ʢධՁ͕େ͖͘มΘΔʣ
  8. /21 ॱং෼ྨ͸ ̋̋̋ Ͱ͸ͳ͍ • n-array classification Ͱ͸ͳ͍
 AC ͕ਅͷ࣌

    weak AC ͱ weak REJ ͳΒޙऀͷํ͕ෆద੾ • ranking prediction Ͱ͸ͳ͍
 (AC, weak AC, undecided) ≠ (undecided, weak REJ, REJ) • value prediction Ͱ͸ͳ͍
 AC ͱ weak AC ͷࠩͱ weak AC ͱ weak REJ ͷࠩ͸ൺֱෆՄ • linear correlation Ͱ͸ͳ͍
 ૬͕ؔߴͯ͘΋ग़ྗͷ஋͕Ұக͠ͳ͍৔߹͸͋Δ 8
  9. /21 ࿦จͷ֓ཁ • ՝୊ҙࣝɿ
 ॱং෼ྨ͸ NLP ͰΑ͘ग़ͯ͘Δ͕ਖ਼͘͠ධՁ͞Ε͍ͯͳ͍ • ໨తɿ
 ॱংई౓Λߟྀͨ͠ධՁࢦඪΛ࡞੒͍ͨ͠

    • ಺༰ɿ
 ධՁࢦඪ͕༗͢Δ΂͖ੑ࣭Λఆٛ͠ɺ۩ମతͳߏ੒ΛఏҊ
 ࣮ݧͰॱং෼ྨ༧ଌ͕ຬͨ͢΂͖ੑ࣭ΛΑ͘ଊ͍͑ͯΔ͜ͱ Λࣔͨ͠ 9 ஶऀΒ͸ http://evall.uned.es/ ͷਓʑ
 ࢦඪܭࢉίʔυ͸ Java Ͱఏڙ: https://github.com/EvALLTEAM/EvALLToolkit
  10. /21 ॱং෼ྨͷࢦඪ ͕ຬͨ͢΂͖ੑ࣭ Eff : system output, : ground truth,

    : data • Ordinal Invariance
 where : strictly increasing func.
 • Ordinal Monotonicity
 if 
 • Class Imbalance
 where s g d ∈ D Eff(s, g) = Eff(f(s), f(g)) f Eff(s′ , g) > Eff(s, g) ∃d . (s(d) ≠ s′ (d)) ∧ (∀d . ((s(d) > s′ (d) ≥ g(d)) ∨ (s(d) = s′ (d)))) Eff(gd1 →c2 , g) > Eff(gd3 →c2 , g) nc1 > nc3 10 ਤ͸ https://www.aclweb.org/anthology/2020.acl-main.363/ ΑΓҾ༻ lly, at interval scale, CEMINT would be lent to a logarithmic version of MAE when- ems are uniformly distributed across classes. eave a more detailed formal and empirical s of CEM at other scales for future work, as t the primary scope of this paper. heoretical Evidence ing a methodology previously applied for fication (Sebastiani, 2015; Sokolova, 2006), ing (Dom, 2001; Meila, 2003; Amigó et al., and document ranking tasks (Moffat, 2013; et al., 2013b), here we define a formal work for OC via desirable properties to be d, which are illustrated in Figure 2 and in- ed below. Metric Properties st property states that an effectiveness met- f(s, g) should not assume predefined inter- etween classes, i.e., it should be invariant permissible transformation functions at ordi- le. Figure 2: Illustration of desirable formal properties for Ordinal Classification. Each bin is a system output, where columns represent ordered classes assigned by the system, and colors represent the items’ true classes, ordered from black to white. "=" means that both out- puts should have the same quality, and ">" that the left output should receive a higher metric value than the right output. strictly better, then the metric score of s0 must be higher. Finally, in order to manage the effect of im- balanced data sets, another desirable property is that an item classification error in a frequent class should have less effect than a classification error e, CEMINT would be version of MAE when- tributed across classes. d formal and empirical ales for future work, as f this paper. e previously applied for 2015; Sokolova, 2006), ila, 2003; Amigó et al., ng tasks (Moffat, 2013; e we define a formal irable properties to be ted in Figure 2 and in- Figure 2: Illustration of desirable formal properties for Ordinal Classification. Each bin is a system output, where columns represent ordered classes assigned by the system, and colors represent the items’ true classes, ordered from black to white. "=" means that both out- puts should have the same quality, and ">" that the left output should receive a higher metric value than the right output. cale, CEMINT would be mic version of MAE when- distributed across classes. iled formal and empirical r scales for future work, as pe of this paper. ence gy previously applied for ni, 2015; Sokolova, 2006), Meila, 2003; Amigó et al., nking tasks (Moffat, 2013; here we define a formal desirable properties to be Figure 2: Illustration of desirable formal properties for Ordinal Classification. Each bin is a system output, where columns represent ordered classes assigned by the system, and colors represent the items’ true classes, ordered from black to white. "=" means that both out- puts should have the same quality, and ">" that the left
  11. /21 Closeness Information Quantity (CIQ) σʔλ෼෍ʹج͍ͮͯʮۙ͞ʯΛఆΊ͍ͨ ʢ৘ใ࿦తͳߟ͑ʹج͖ͮʣσʔλ ͱ ͷؒʹσʔλ Λ

    ؍ଌ͢Δ֬཰͕খ͍࣌͞ɺ ͱ ͕ʮ͍ۙʯͱఆٛ͢Δ : ͱ Λൺ΂ͨͱ͖ʹ ͷํ͕ ʹ͍ۙ֬཰ a b x a b P(x ⪯b ORD a) x a x b CIQORD(a, b) := − log(P(x ⪯b ORD a)) CIQORD(s(d), g(d)) = − log(P(x ⪯g(d) ORD s(d))) 11
  12. /21 Closeness Evaluation Measure (CEM) CIQ Λ࠷େ஋ Ͱن֨Խͯ͠શσʔλͰ଍্͛͠Δ • ࠷େ஋͸

    ͷͱ͖Ͱ 1 ʹͳΔ • ࠷খ஋͸෼෍ʹґΔ • ʮ͍ۙʯؒҧ͍ΑΓʮԕ͍ʯؒҧ͍ͷํ͕ ͸௿͘ͳΔ s(d) = g(d) CEMORD(s, g) = ∑ d∈D CIQORD(s(d), g(d)) ∑ d∈D CIQORD(g(d), g(d)) ∀d s(d) = g(d) CEM 12
  13. /21 ॱং෼ྨͰͷ۩ମతදࣜ ͸ ΋͘͠͸ Λҙຯ͢Δ : ͱͳΔσʔλ਺, : શσʔλ਺ •

    ͷͱ͖͸ • ͷ 1/2 factor ͸ empirical ͳ΋ͷʢஶऀஊʣ x ⪯b ORD a a ≤ x ≤ b b ≤ x ≤ a ni g(d) = ci N CIQORD(ci , cj ) = − log ni 2 + ∑j k=i+1 nk N ci = cj CIQORD(ci , ci ) = − log(ni /2N) ni 13
  14. /21 ܭࢉྫ 7 105 193 90 7 0 50 100

    150 200 reject weak reject undecided weak accept accept # papers 180 10 3 10 173 0 50 100 150 200 reject weak reject undecided weak accept accept # papers Figure 1: In the left distribution, weak accept vs. weak reject would be a strong disagreement between reviewers (i.e., the classes are distant), because in practice these are almost the extreme cases of the scale (reviewers rarely go for accept or reject). In the right distribution the situation is the opposite: reviewers tend to take a clear stance, which makes weak accept and weak reject closer assessments than in the left case. the true classes in the gold standard. A key idea in our metric is to establish a notion of informational closeness that depends on how items are distributed in the rank of classes. The idea is item d 2 D by the gold standard and the system output, CIQORD(s(d), g(d)) measures the closeness between the assigned class and the gold standard class: ਤ͸ https://www.aclweb.org/anthology/2020.acl-main.363/ ΑΓҾ༻ 
 = CIQORD(weak accept, weak reject) −log ( 90/2 + 193 + 105 302 ) ≃ 0.23 14 
 = CIQORD(weak accept, weak reject) −log ( 10/2 + 3 + 10 376 ) ≃ 4.38
  15. /21 ଞͷई౓΁ͷҰൠԽ ֤ई౓Ͱۙ͞Λఆٛ: • ໊ٛई౓
 : શ୯ࣸؔ਺ → • ॱংई౓


    : ڱٛ୯ௐ૿Ճؔ਺ → • ִؒई౓
 : ઢܗؔ਺ → ∃f ∈ ℱT (| f(x) − f(b)| ≤ | f(a) − f(b)|) ℱNOM (b = x) ∨ (b ≠ a) ℱORD (a ≤ x ≤ b) ∨ (b ≤ x ≤ a) ℱINT |b − x| ≤ |b − a| 15
  16. /21 ఏҊࢦඪ͸ຬͨ͢΂͖ੑ࣭Λຬ͍ͨͯ͠Δ Table 2: Constraint-based Metric Analysis Constraints Metric family

    Metrics Ord. Ord. Imb. Inv. Mon. Acc 3 - - Classification Acc with n 3 - - Metrics Macro Avg Acc, Cohen’s  3 - 3 F-measure avg. across classes 3 - 3 MAE, MSE - 3 - Value Macro Avg. MAE/MSE - 3 3 Prediction Weighted  - 3 3 Rennie & Srebro loss function - 3 - Cosine similarity - 3 - Linear correlation - - - Correlation Ordinal: Kendall (tau-b), Spea. 3 - 3 Coefficients Kendall-(Tau-a) 3 - - Reliability and Sensitivity 3 - 3 Clustering MI, Purity and Inv. Purity 3 - 3 Path based Ordinal Classification Index 3 - - CEMNOM 3 - 3 CEM CEMINT - 3 3 CEMORD 3 3 3 The most popular Value Prediction metrics are aged. In Tau-a, only discordant pair (g(d1) > g(d2) and s(d1) < s(d2) is not satisfied. The most popular cient approach (Tau-b) and Spearm imbalance. Pearson coefficient doe interval effect. Reliability and Sen which extend the clustering metric sentially an ordinal correlation met ant but failing in monotonicity, wit of satisfying imbalance due to the notions. By definition, clustering metric variant, because they are not affecte of category descriptors. In additio such as Mutual Information (MI) o verse Purity, satisfy imbalance. Ho not ordinal monotonic, given that sider any ordinal relationship betw Finally, we must include the ap doso and Sousa (2011), a path bas Ordinal Classification Index whi specifically for OC problems. T that integrates aspects from the pre ric families, including two param ਤ͸ https://www.aclweb.org/anthology/2020.acl-main.363/ ΑΓҾ༻ ɾOrdinal Invariance
 ɾOrdinal Monotonicity
 ɾClass Imbalance
 શͯΛຬ͍ͨͯ͠Δ 16
  17. /21 ࣮ݧઃܭ ॱং෼ྨͷࢦඪ͸ҎԼͷ৘ใΛଊ͍͑ͯΔ΋ͷͱԾఆ • Accuracy (acc): ΫϥεҰகͷ৘ใ • Kendall Tau-a

    (Kendall): ॱং৘ใ • Mutual Information (MI): class imbalance ৘ใ γεςϜ s, s’ ͷग़ྗΛൺֱ͢Δͱ͖ʹɺ
 3 ͭͷࢦඪશ͕ͯ޲্͢Δ৔߹ʹఏҊࢦඪ΋޲্͢Δ͔ΛଌΔ 17
  18. /21 ϝλ෼ੳͷධՁࢦඪ coverage ΛҎԼͰఆٛʢ ͸ࢦඪ, ͸ acc, Kendall, MIʣ
 ͋Δσʔληοτʹର͢Δෳ਺ͷγεςϜग़ྗΛ࢖ͬͯܭࢉ

    m ℳ Covℳ (m) = Spea (m(s) − m(s′ ), UIRℳ (s, s′ )) 18 εϐΞϚϯॱҐ૬ؔ܎਺
 ʢείΞॱংʹ஫໨ʣ ஫໨ͯ͠Δ
 ࢦඪ ͷࠩ෼ m Unanimous Improvement
 Ratio (UIR) acc, Kendall, MI શ͕ͯ޲্ ͨ͠σʔλͷࠩ෼ͷׂ߹ iments, in addition to robustness, we select three complementary metrics, each focused on one of these partial aspects, and we evaluate to what ex- tent existing OC metrics are able to capture all these aspects simultaneously. The selected metrics are: (i) Accuracy, as a partial metric which captures class matching; (ii) Kendall’s correlation coefficient Tau-a (without counting ties), in order to capture class ordering2; and (iii) Mutual Information (MI), a clustering met- ric which reflects how much knowing the system output reduces uncertainty about the gold standard values. This metric accentuates the effect of small classes (imbalance property). 5.1 Meta-evaluation Metric In order to quantify the ability of metrics to capture the aspects reflected by these three metrics, we use the Unanimous Improvement Ratio (UIR) (Amigó et al., 2011). While robustness focuses on consis- tence across data sets, UIR focuses on consistence across metrics. It essentially counts in how many test cases an improvement is observed for all met- rics simultaneously. Being M a set of metrics, and T a set of test cases, and st a system output for the test case t, the Unanimous Improvement Ratio UIRM(s, s0) between two systems is defined as: t 2 T : st M s0 t t 2 T : s0 t M st T , where st M s0 t represents that system s improves system s0, on the test case t, unanimously for every Th res im asp 5.2 We me cur hav eva me eith to ric the cat Ind ari the inc sim 5.3 In o am per syn um In we ing iments, in addition to robustness, we select three complementary metrics, each focused on one of these partial aspects, and we evaluate to what ex- tent existing OC metrics are able to capture all these aspects simultaneously. The selected metrics are: (i) Accuracy, as a partial metric which captures class matching; (ii) Kendall’s correlation coefficient Tau-a (without counting ties), in order to capture class ordering2; and (iii) Mutual Information (MI), a clustering met- ric which reflects how much knowing the system output reduces uncertainty about the gold standard values. This metric accentuates the effect of small classes (imbalance property). 5.1 Meta-evaluation Metric In order to quantify the ability of metrics to capture the aspects reflected by these three metrics, we use the Unanimous Improvement Ratio (UIR) (Amigó et al., 2011). While robustness focuses on consis- tence across data sets, UIR focuses on consistence across metrics. It essentially counts in how many test cases an improvement is observed for all met- rics simultaneously. Being M a set of metrics, and T a set of test cases, and st a system output for the test case t, the Unanimous Improvement Ratio UIRM(s, s0) between two systems is defined as: t 2 T : st M s0 t t 2 T : s0 t M st T , where st M s0 t represents that system s improves system s0, on the test case t, unanimously for every metric: st M s0 t ⌘ 8m 2 M m(st) m(s0 t . The resp imp asp 5.2 We met cur hav eva met eith to e rics ther cate Ind arit the inc sim 5.3 In o am per syn um In o we ing typ gra The Fin
  19. /21 ࣮ݧσʔλ • ਓ޻σʔλ • ฏۉ 4, ෼ࢄ 1~3 Ͱ

    11 ݸͷΫϥεΛ෇༩ͨ͠ 20,000 ݅ • γεςϜग़ྗ͸ {0.1, 0.2, …, 1.0} ͷ֬཰Ͱؒҧ͑Δ΋ͷΛؒ ҧ͑ํͱͯ͠ҎԼ 5 ύλʔϯͣͭੜ੒͠ܭ 500 ݅ • ฏۉ, ϥϯμϜ, ΧςΰϦΛͣΒ͢, ॱংϥϯμϜ, ॱংมߋ • ࣮σʔλ • ground truth ͱෳ਺ͷγεςϜग़ྗ͕࢖͑Δίϯϖσʔλ • ۃੑ෼ੳͷ RepLab2013, ײ৘෼ੳͷ SemEval2014, 2015 19 RepLab2013 Dataset: http://nlp.uned.es/replab2013/
 SemEval2014, 2015: http://alt.qcri.org/semeval2014/ , http://alt.qcri.org/semeval2015/
  20. /21 ఏҊख๏ͷ coverage ͕༏Ε͍ͯΔ ਤ͸ https://www.aclweb.org/anthology/2020.acl-main.363/ ΑΓҾ༻ Table 3: Metric

    Coverage: Spearman Correlation between single metrics and the UIR combination of Mutual Information, Accuracy, and Kendall across system pairs in both the synthetic and real data sets. Synthetic data Real data all minus minus minus minus minus Replab SEM-2014 SEM-2015 systems sRand sprox smaj stDisp soDisp 2013 T9-A T9-B T10-A T10-B T10-C Reference Accuracy 0.81 0.77 0.78 0.78 0.94 0.77 0.75 0.90 0.98 0.85 0.94 0.80 metrics in Kendall 0.84 0.81 0.82 0.82 0.93 0.82 0.88 0.94 0.98 0.84 0.97 0.88 UIR MI 0.84 0.82 0.84 0.82 0.93 0.82 0.91 0.97 0.99 0.93 0.98 0.93 F-measure 0.83 0.80 0.82 0.81 0.93 0.81 0.66 0.90 0.98 0.91 0.98 0.92 Classification MAAC 0.83 0.81 0.82 0.79 0.91 0.81 0.84 0.86 0.97 0.84 0.95 0.82 metrics Kappa 0.81 0.78 0.79 0.77 0.94 0.77 0.44 0.95 0.99 0.93 0.98 0.97 Acc with 1 0.79 0.75 0.77 0.80 0.85 0.79 0.23 0.82 0.60 0.31 0.35 -0.19 MAE 0.84 0.82 0.83 0.87 0.86 0.84 0.81 0.96 0.95 0.95 0.87 0.56 Error MAEm 0.74 0.73 0.74 0.80 0.76 0.73 0.73 0.95 0.88 0.91 0.74 0.30 minimization MSE 0.89 0.87 0.87 0.88 0.93 0.88 0.28 0.87 0.98 0.63 0.97 0.93 MSEm 0.83 0.80 0.80 0.82 0.90 0.83 0.10 0.85 0.94 0.48 0.91 0.52 Correlation Pearson 0.77 0.79 0.74 0.73 0.83 0.79 0.91 0.97 0.98 0.96 0.97 0.79 coefficients Spearman 0.72 0.67 0.69 0.77 0.76 0.70 0.07 0.96 0.98 0.97 0.98 0.80 Measurement CEMORD 0.91 0.89 0.90 0.90 0.95 0.89 0.94 0.96 0.99 0.98 0.99 0.96 theory CEMORD flat 0.87 0.84 0.86 0.88 0.89 0.87 0.82 0.96 0.96 0.94 0.92 0.65 3. Tag displacement: Assign the next category: stDisp(d) = g(d) + 1. 4. Ordinal displacement: Being ord(d) the ordinal position of d in a sorting of docu- the second system assigns documents to the major- ity class, but not in terms of MI, which accounts for the imbalance effect. Table 3 (left part) shows the results. The met- log scaling ͳ͠ͷ৔߹͸݁Ռ͕ྑ͘ͳ͍
 ʢlog scaling ͸ॏཁʣ 20 ఏҊࢦඪ͸ acc, Kendall, MI શ͕ͯ޲্͢ Δ৔߹ͱߴ͍૬ؔʢಛ௃Λଊ͍͑ͯΔʣ
  21. /21 ·ͱΊ • ॱং෼ྨʹ͸ద੾ͳධՁࢦඪ͕ͳ͔ͬͨ • ॱংई౓Λߟྀͯ͠ࢦඪ͕ຬͨ͢΂͖ੑ࣭Λྻڍ • ੑ࣭Λຬͨ͢ Closeness Evaluation

    Measure (CEM) ΛఏҊ • ࣮ݧʹΑΓఏҊख๏͕༏Ε͍ͯΔͱ͍͏݁ՌΛಘͨ
 acc, Kendall, MI શ͕ͯ޲্͢Δ৔߹ͱߴ͍૬ؔ • Ϟσϧֶशʹ࢖͑ΔΑ͏֦ு͢Δͷ͸ڵຯਂ͍ํ޲
 ඍ෼ՄೳͳఆࣜԽɺσʔλ෼෍ʹର͢ΔԿ͔͠ΒͷԾఆ 21