Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
기계학습을 활용한 게임 어뷰징 검출
Search
JeongJu Kim
August 16, 2016
Technology
1
980
기계학습을 활용한 게임 어뷰징 검출
PyConAPAC 2016에서 발표한 문서입니다.
JeongJu Kim
August 16, 2016
Tweet
Share
More Decks by JeongJu Kim
See All by JeongJu Kim
IPython과 Pandas를 활용한 게임데이터 분석 - PyConKR 2014
haje01
7
1.8k
Other Decks in Technology
See All in Technology
PHPカンファレンス名古屋-テックリードの経験から学んだ設計の教訓
hayatokudou
2
290
N=1から解き明かすAWS ソリューションアーキテクトの魅力
kiiwami
0
130
データの品質が低いと何が困るのか
kzykmyzw
6
1.1k
室長と気ままに学ぶマイクロソフトのビジネスアプリケーションとビジネスプロセス
ryoheig0405
0
360
Nekko Cloud、 これまでとこれから ~学生サークルが作る、 小さなクラウド
logica0419
2
970
関東Kaggler会LT: 人狼コンペとLLM量子化について
nejumi
3
580
急成長する企業で作った、エンジニアが輝ける制度/ 20250214 Rinto Ikenoue
shift_evolve
3
1.3k
Developers Summit 2025 浅野卓也(13-B-7 LegalOn Technologies)
legalontechnologies
PRO
0
720
Goで作って学ぶWebSocket
ryuichi1208
0
630
Data-centric AI入門第6章:Data-centric AIの実践例
x_ttyszk
1
400
技術的負債解消の取り組みと専門チームのお話 #技術的負債_Findy
bengo4com
1
1.3k
CZII - CryoET Object Identification 参加振り返り・解法共有
tattaka
0
370
Featured
See All Featured
Building Better People: How to give real-time feedback that sticks.
wjessup
367
19k
StorybookのUI Testing Handbookを読んだ
zakiyama
28
5.5k
Rails Girls Zürich Keynote
gr2m
94
13k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
44
7k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
3.7k
Typedesign – Prime Four
hannesfritz
40
2.5k
A Philosophy of Restraint
colly
203
16k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
233
17k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
6
550
Speed Design
sergeychernyshev
27
790
How GitHub (no longer) Works
holman
314
140k
Transcript
ӝ҅णਸ ഝਊೠ ѱ য࠭ Ѩ ӣ PyCon APAC 2016 PyCon
APAC 2016 1
ߊ ࣗѐ ӣ (
[email protected]
) : ѱ ѐߊ - NHN /
NPLUTO - 3D ূ / ѱ ۄ ѐߊ അ: ѱ ؘఠ ࣻ / ࠙ࢳ - Webzen NPlay - ۽Ӓ ನਕ؊, Pandas, Scikit-Learn, PySpark PyCon APAC 2016 2
ߊח 4 ӝ҅णী ೠ ӝࠄ ध ח ٜ࠙ਸ ࢚
4 ॆਸ ഝਊೠ ؘఠ ࠙ࢳҗ ӝ҅ण ࢎ۹ܳ ҕਬ 4 ѐߊҗ ࢲ࠺झী ӝ҅णਸ بੑೞח ҅ӝо غਵݶ פ PyCon APAC 2016 3
द زӝ 4 ѱ য࠭ ઁܳ 4 ਬ नҊ /
GM ݽפఠ݂ / ಁఢ ӝ۽ח ೠ҅ 4 ࢎۈ ѐੑ ୭ࣗചػ য࠭ ఐ दझమਸ ٜ݅ PyCon APAC 2016 4
ѱ য࠭ۆ? 4 “ӝദਵ۽ بೞ ঋ ߑधਵ۽ ѱ ࠁܳ
ദٙೞѢա ب ਸ ח ೯ਤ” ! 4 ࢎ۹ 4 ࢲ࠺झ ҳഅ࢚ ਸ ਊೠ ۨ 4 ೧ఊ ోਸ ࢎਊೠ ࠺࢚ ۨ 4 ହী بߓ۽ ҟҊ PyCon APAC 2016 5
ా҅৬ ఐ࢝ ؘఠ ࠙ࢳ PyCon APAC 2016 6
ࢶ, ా҅ 4 ా҅ח ೂࠗೞ ޅೠ ؘఠ৬ ஹೊ ਕ ജ҃ীࢲ
ߊ 4 ా҅ ٜ ؘఠ/҅ਸ ח ߑߨਸ োҳ 4 ৌঈೠ ജ҃ীࢲ ٜ݅যӝী, ؘఠীࢲب оܳ ߊѼೡ ࣻ 4 ӝࠄੋ ా҅ ध ѐߊ, ӝദ, ࢲ࠺झ ١ী ب ؽ PyCon APAC 2016 7
ఐ࢝ ؘఠ ࠙ࢳ 4 ؘఠী ऀযח ࠁܳ, 4 নೠ пب۽
ਃড, दпച ೧ࠁݴ ח җ ! 4 ೞח ؘఠח җࠗఠ 4 दझమ(WzDat) ѐߊ೧ ഝਊ " 4 Jupyter + Utility + Dashboard 4 https://github.com/haje01/wzdat 4 http://www.pycon.kr/2014/program/14 PyCon APAC 2016 8
ࢎ۹1 рױೠ ా҅ ই٣য۽ झಁݠ Ѩ PyCon APAC 2016 9
࢚ട 4 नӏ য়ೠ ѱ ହ ѱ ইమ ҟҊӖ۽ оٙ
! 4 ೧ ҅ਸ ઁ೧ب ߄۽ ࢜ ҅ਵ۽ ҟҊ ҅ࣘ 4 ࡅܲ ઁо ਃೞৈ, ӝ҅णਸ ೯ೞӝীח दр ࠗ PyCon APAC 2016 10
ਸ ਊೠ झಅ (Spam) 4 ѱ ղীࢲ ਵ۽ ݠפ/ইమ
౸ݒ ҟҊ 4 য࠭ח ۽Ӓ۔ ౸߹ਸ ݄ӝਤ೧ ݫ दܳ դةച PyCon APAC 2016 11
झಁݠ Ѩ 4 নೠ ߑߨ оמೞѷਵա, 4 োয ܻա ӝ҅णэ
Ҋә Ӕࠁ, 4 рױೠ ా҅ ই٣য۽ दب PyCon APAC 2016 12
ৡۄੋ ݫद ӡ ࠙ನ 4 ੌ߈ਵ۽ ۽Ӓ ӏ࠙ನܳ
ٮܲҊ ঌ ۰ઉ. 4 ইېח NPS Chat Corpus ݫद ӡ ࠙ನ PyCon APAC 2016 13
ѱ ղ ݫद ӡ ࠙ನ 4 ৡۄੋ җ ࠺तೞա
ખ ؊ فԁ ҃ ೱ 4 ౠ ӡ ݫदо (?) → झಅਵ۽ ഛੋ PyCon APAC 2016 14
ই٣য 4 ੌ߈ ਬ: ݫद ӡо নೞҊ, ࠼بо ݆ ঋ
4 झಁݠ: ݫद ӡо নೞ ঋҊ, ࠼بח ֫ 4 , যڃ ਬ ࠼بо ֫Ҋ ӡо নೞ ঋਵݶ झಁݠ PyCon APAC 2016 15
рױೠ Ѩ ҕध 4 ਬ ߹ പࣻ / ݫद
ӡ ઙܨ ࣻ 4 ࠺तೠ ӡ ݫदܳ ࠁյ ࣻ۾ ч ழ PyCon APAC 2016 16
࠙ܨ 4 spam_ratioо ӝળ ч ࢚ੋ Ѫਸ झಁݠ۽ р 4
ӝળ ч Ѿ ോܻझ౮ೞѱ... 4 ࢸ റ, ࠙ܨػ நܼఠ ݫद ഛੋਵ۽ ч ઑ PyCon APAC 2016 17
࠙ܨ റ ݫद ӡ ࠙ನ 4 ࠼بо ֫ ౠ ӡ
ݫद(= झಅ)о ܻ࠙غ PyCon APAC 2016 18
Ѿҗ ਊ 4 ҳഅ рױ೮݅, য়ఐ оמࢿ 4 ӝળ
чਸ ֫ѱ ই न܉بܳ ֫ 4 Ѿҗܳ оҊ ઁ PyCon APAC 2016 19
ѐࢶ ߑೱ 4 ӝળ ч Ѿਸ ખ ؊ җੋ ߑߨਵ۽
4 োয ܻ ӝࣿ(NLP) بੑ 4 ױয߹ ࠼ب(Ziff’s Law)৬ ਃب(TF-IDF) Ҋ۰ 4 ӝ҅ण ঌҊ્ܻ ਊ PyCon APAC 2016 20
ӝ҅ण ࣗѐ PyCon APAC 2016 21
ӝ҅णਸ ॳח ਬ 4 ֢۱ਵ۽ ҡଳ Ѿҗޛ 4 নೠ
ޙઁী ೠ ੌ߈ੋ ࣛܖ࣌ 4 ࣻ ౠࢿ(ೖ)ਸ زदী Ҋ۰ೡ ࣻ 4 ؘఠ ߸زী ъೣ(ъѤࢿ) PyCon APAC 2016 22
࠙ܨ৬ ഥӈ 4 ӝ҅ण ѱ ࠙ܨ (Classification)৬ ഥӈ (Regression)۽ ա
4 ࠙ܨ - ઙܨܳ ஏ ೞח Ѫ 4 ഥӈ - োࣘػ чਸ ஏ ೞח Ѫ 4 য࠭ Ѩ ࠙ܨী ࣘೣ PyCon APAC 2016 23
ب णҗ ਯ ण 4 ب ण(Supervised Learning) 4 ӝઓ
҃ী ೧ ࠙ܨػ ࢠ ؘఠо ਸ ٸ 4 ਯ ण(Unsupervised Learning) 4 ࠙ܨػ ࢠ ؘఠо হਸ ٸ 4 ࠗ࠙ ؘఠח ࠙ܨغয ঋ → ಽযঠೡ ޙઁ PyCon APAC 2016 24
ӝ҅ण ঌҊ્ܻٜ 4 ӝࠄ 4 ܻפয/۽झ౮ ܻӒۨ࣌(Linear/Logistic Regression) 4 Ѿ
ܻ(Decision Tree) 4 Ҋә 4 ےؒ ನۨझ(Random Forest) 4 SVM(Support Vector Machine) 4 ੋҕ न҃ݎ(Neural Network) PyCon APAC 2016 25
ঌҊ્ܻ ࢶఖ? 4 ੌ߈ਵ۽ Ҋә ঌҊ્ܻ ؊ ࠂೠ ݽ؛ ण
оמ 4 Ӓ۞ա, Ҋә ঌҊ્ܻ ޖઑѤ જ Ѫ ইש 4 ण Ѿҗܳ ࢎۈ ೧ೞӝীח ӝࠄ ঌҊ્ܻ જ PyCon APAC 2016 26
ஏী ೠ ಣо 4 ഛࢿী ೠ о ਃ ! 4
Q: ਬ 100ݺ 2ݺ ח য࠭ܳ Ѩೞ۰ ೠ. पࣻ۽ ݽف ࢚ ਬ۽ ౸ױ೮ਸ ٸ ഛبח? 4 A: 100ݺ 2ݺ ౣ۷ਵפ… 98% !?#@ PyCon APAC 2016 27
ஏ ױਤ 4 ب(Precision) അਯ(Recall)җ ١ নೠ ױਤ 4 ب:
Ѫ ݃ա য࠭ੋо? 4 അਯ: য࠭ ݃ա ওחо? 4 ؘఠо ࠛӐഋ(Imbalance)ੌٸח ౠ ب৬ അਯਸ ೣԋ Ҋ۰೧ঠ 4 খ ҃ח അਯ 0 PyCon APAC 2016 28
P/R Curve ৬ AUC જ ࠙ܨӝח? PyCon APAC 2016 29
ࢎ۹2 ӝ҅णਵ۽ ߁ Ѩ PyCon APAC 2016 30
࢚ട 4 ۄ࠳ ѱীࢲ пઙ ೧ఊ ోਸ ࢎਊೠ ߁ ۨо
ഝѐ ! 4 ߁: ѱ ղ ചܳ ࠺ ࢚ੋ ߑߨਵ۽ णٙ 4 ࠈ ౠࢿਸ ೞա ل۽ ౠೞӝ য۰ → ӝ҅ण ਃ PyCon APAC 2016 31
ण ߑध ࢶఖ 4 Ҷ ۡ֔/٩۞ਵ۽ ೡ ਃח হח ٠…
4 җѢ ۽Ӓо غҊ Ҋ, 4 ஏীࢲ ӝઓ য࠭ நܼఠ ܻझܳ оҊ ! → ӝ҅ण, ౠ ب ण оמ! 4 Decision Tree ߑध ب णਵ۽ Ѿ PyCon APAC 2016 32
ળ࠺ җ 1. ۽Ӓ ࣻ ࢚క ഛੋ 2. ۽Ӓ ҳઑ/
ঈ 3. णਸ ਤೠ ೖ(Feature) ୶ PyCon APAC 2016 33
ӝ҅णب ۽Ӓ ࣻࠗఠ 4 ۽Ӓܳ ҅ਵ۽ ݽਵח Ѫب औ ঋ
4 ࠙ࢳ/णী Ѧܻח दр 10~20% ب 4 ؘఠܳ ݽਵҊ оҕೞחؘ ࠗ࠙ दр Ѧܽ. 4 ۽Ӓ ഋध оә Ӓ۽ ࢎਊ (झౚ٣য়ܳ ਤ೧… !) 4 ۽Ӓܳ ࠙ܨ೧ (ࢲߡ/۽Ӓ ઙܨ, द ߹۽) 4 ۄ٘ झషܻ(S3) ୶ୌ ☁ PyCon APAC 2016 34
ਦب ࢲߡীࢲ ۽Ӓ ࣻೞӝ 4 ѱ ࢲߡח ࠗ࠙ ਦب ӝ߈
4 য় ࣗझ જ ోٜ(fluentd, logstash ١)ਸ ॳҊ रਵա 4 ਦب ࢲߡী ࢸо औ ঋҊ, ੌࠗ ӝמ ࠗ 4 ѐߊ ! 4 https://github.com/haje01/wdfwd 4 ࢲߡী թ ۽Ӓ ੌਸ RSync۽ زӝೞѢա 4 ѱ DBী ࣘೞৈ Dump റ ࣠ PyCon APAC 2016 35
۽Ӓо ࣻ غਵݶ ೖܳ ٜ݅ 4 ೖ(Feature, ౠࢿ): ण ࢚
ౠਸ ࢸݺ೧ח ч 4 ) чਸ ஏೞח ҃ ! → ӝ, ߑೱ, ജ҃, Үా, ಞदࢸ ١ ೖ PyCon APAC 2016 36
ೖ ѐߊ(Feature Engineering) 4 (࠺)ഋ ؘఠীࢲ ೖܳ Ҋ ࢤࢿೞח স
4 ܲ ೖٜী ղػ ೖܳ ইղӝب ೣ 4 ٸ۽ח ࠂೠ ٘о ਃ(SQL۽ח ൨ٝ) 4 3ѐਘ ࠙ ۽Ӓীࢲ ೞنਸ ా೧ ೖ ࢤࢿ PyCon APAC 2016 37
ೞنਸ ॄঠ݅ ೞա? 4 ؘఠо Bigೞ ঋਵݶ ਃ হ 4
न… 4 ߓ Jobਸ য়ۖزউ جܻѢա 4 ӝਵ۽ ETLਸ ా೧ DBী ֍যفח җ ਃೡ ࣻ 4 ࠺ഋ/ਊ ؘఠীࢲ ࠼ߣೠ ೖ ѐߊਸ ೠݶ જ PyCon APAC 2016 38
যڌѱ ॄঠೞա? 4 ೞن ۞झఠܳ ҳ୷ೞৈ ࢎਊೡ ࣻب ਵա,
ࣇҗ ਊ য۰ 4 ۄ٘ ࢲ࠺झীࢲ ઁҕೞח ೞن ࢲ࠺झܳ ਊ ! - AWS EMR(Elastic Map Reduce) PyCon APAC 2016 39
AWSח ࠺ऱ ঋա? 4 ୭ച ೞݶ ࠺ऱ ঋ ! 4
ਃೡ ٸ݅ ॳח ױࣘ ۞झఠ(Transient Cluster)۽ ਊ 4 Task ֢٘ח ҃ݒ ߑध Spot Instance۽ 4 m4.xlarge(4 vCPU, 16 GiB RAM ): दр 0.036$ (ࢲ ܻ, 2016-08-09 ӝળ) PyCon APAC 2016 40
AWS EMR ۞झఠ द ചݶ PyCon APAC 2016 41
ೞنਸ ਤೠ ۽Ӓ оҕ 4 ೞن ੌ(< 100MB)ٜ ݆
Ѫী ஂড 4 ੌٜ ߽, ࣗ, ୷ೡ ਃ 4 ݃ٶೠ ోਸ ޅ೧ ѐߊ ! 4 https://github.com/haje01/mersoz 4 ߄Ո ੌ݅ স, ઓ ҙ҅ܳ Ҋ۰ೠ ߽۳ ܻ PyCon APAC 2016 42
ݠ, ࣗ & ୷ റ S3ী ػ ۽Ӓ PyCon APAC
2016 43
ೞن MapReduce ٬ - mrjob 4 Yelpীࢲ ݅ٚ Python ಁః
4 ೞن झܿਸ ਊ೧ ॆਵ۽ MR ٬ 4 ۽ஸীࢲ ࢠ ؘఠ۽ ѐߊೠ റ, EMRী ৢܿ ! 4 प೯ ࣘبח Javaߡ ࠁ ખ וܻ݅ ѐߊ ࣘبо ࡅܴ PyCon APAC 2016 44
from mrjob.job import MRJob import re WORD_RE = re.compile(r"[\w']+") class
MRWordFreqCount(MRJob): def mapper(self, _, line): # ۽Ӓ ੌ п ۄੋ for word in WORD_RE.findall(line): # ݽٚ ױযী ೧ yield word.lower(), 1 # 'ױয', 1 ߈ജ def combiner(self, word, counts): # ֢٘ Ѿҗܳ ஂ yield word, sum(counts) def reducer(self, word, counts): # ۞झఠ Ѿҗܳ ஂ yield word, sum(counts) if __name__ == '__main__': MRWordFreqCount.run() PyCon APAC 2016 45
दझమ ҳࢿب PyCon APAC 2016 46
അട ঈ 4 ӝ҅णਸ ਤ೧ 4 GM ઁೞח ӔѢ(=ೖ)৬ 4
ઁػ நܼఠ ܻझܳ ਃ PyCon APAC 2016 47
ೖ ࢤࢿ 4 ۽Ӓীࢲ நܼఠ ӝળਵ۽ ҳೣ 4 Үೠ
ೖࠁח নೠ ೖܳ 4 যରೖ ࠂਵ۽ ౸ױ 4 ୡӝীח ૣ दрী ೧, উചغݶ ӡѱ PyCon APAC 2016 48
ୡӝী ࡳইࠄ ೖٜ 4 ۽Ӓੋ ࣻ 4 ۨ दр 4
۽Ӓ ইਓ ࠛ࠙ݺೠ ҃о ݆ 4 ࣁ࣌ ইਓ بੑ: 5࠙ ⏱ 4 ইమ/ݠפ णٙ ࣻ 4 ௮झ ઙܐ ࣻ 4 NPC/PC р ై ࣻ PyCon APAC 2016 49
ೖ ఋੑ? 4 ѱ पࣻ ഋ, పҊܻ ഋ, ࠛܽ(Boolean) ഋਵ۽
աׇ 4 оә पࣻ ഋਵ۽ ాੌೞח Ѫ ߄ۈ 4 Bool 0, 1۽ 4 పҊܻ ఋੑ OneHotEncoderܳ ࢎਊ೧ पࣻഋਵ۽ PyCon APAC 2016 50
ٜ݅য ೖ 4 ױࣽ ఫझ (.txt) ੌ 4 நܼఠݺ
+ ೖ ߓৌ ഋध PyCon APAC 2016 51
ӝ҅ण ೯ PyCon APAC 2016 52
ӝ҅ण о߶ 4 ୭ઙ ೖ ੌ ӝо Ҋ, ӝ҅ण
ࣻ೯ب о߶ ಞ 4 ۽ஸ PCীࢲ ࣻ೯ 4 ୶ୌ दझమۢ ݽٚ ؘఠܳ ࠊঠೞח ण ޖѢ Ѫ 4 ݽ؛ਸ ࢶఖೞҊ ୭ ೞಌ ಁ۞ఠܳ Ѿೞח Ѫ җઁ 4 নೠ ࣇਵ۽ ৈ۞ߣ प೧ࠊঠ 4 ࠙ दझమਸ ഝਊೞח ҃ب... PyCon APAC 2016 53
যڃ ঌҊ્ܻ ݽ؛ਸ ࢶఖೡ Ѫੋо? 4 द рױೠ Ѫਵ۽ 4
࠺तೠ ࢎ۹ ࢶ೯ োҳо ਵݶ ଵҊೞ 4 AUCա ROCܳ ాೠ ݽ؛ ಣо ߂ ࢶఖ PyCon APAC 2016 54
Decision Tree۽ द 4 ࠂೞ ঋҊ ౸ױ җ ೧о ਊ
4 ॆ Scikit-Learn ಁః Ѫਸ ࢎਊ 4 নೠ ӝ҅ण ঌҊ્ܻਸ प ઁҕ 4 ੋఠಕझо ాੌغয য ݽ؛ Үо ਊ 4 ೖ(X)৬ য࠭ ৈࠗ(y)ܳ ֍Ҋ ण 4 DTח ೖ ӏച ਃ হয ಞܻ PyCon APAC 2016 55
DT ࢎਊ (ࠠԢ ࠙ܨ) from sklearn.datasets import load_iris from
sklearn import tree iris = load_iris() clf = tree.DecisionTreeClassifier() clf = clf.fit(iris.data, iris.target) >>> clf.predict(iris.data[:1, :]) array([0]) PyCon APAC 2016 56
PyCon APAC 2016 57
Decision Tree ण җ 1. ೖ ੌীࢲ ӝઓ য࠭ ೖܳ
Ҋ 2. زࣻ ࢚ ਬ ೖ ҳೣ 4 Under Sampling 3. ؘఠܳ Train/Test ࣇਵ۽ ա־Ҋ 4. ӝࠄ ಁ۞ఠ۽ ण द PyCon APAC 2016 58
ୡӝ Ѿҗ 4 ಣӐ ഛب 80% ب 4 Binary Class
࠙ܨ ҃ ࣻо ੜ աয়ח ಞ 4 աࢁ ঋѪ э݅, 4 ஏ Ѿҗо ઁ ӔѢ۽ ॳੋח ীࢲ ݆ ࠗ PyCon APAC 2016 59
ഛبܳ ৢܻ 4 Үର Ѩૐ(Cross Validation)ਸ ਤ೧ ؘఠ ࣇਸ ܻ࠙
ೞҊ 4 GridSearchCVܳ ా೧ ୭ ೞಌ ಁ۞ఠܳ 4 ಣӐ ഛب 91%۽ ೱ࢚ 4 যڃ ӝળਵ۽ ౸ױೞח ೠ ߣ ࠁҊ र tree.export_graphviz۽ Ӓ۰ࠆ PyCon APAC 2016 60
PyCon APAC 2016 61
Ѿ ܻܳ ࠁפ... 4 णػ ݽ؛ যڃ ӝળਵ۽ ౸ױೞח ঌ
ࣻ → নೠ ҵ ࢎۈٜী ҕਬ оמ ! 4 ೞࠗ۽ ղ۰т ࣻ۾ ࠂ೧ח ޙઁ 4 DTח җ(Overfitting)غӝ औӝী, Depthо ցޖ Ө ঋѱ PyCon APAC 2016 62
ৈӝࢲ ؊ ࢚ ࣻо ৢۄо ঋ 4 GMשҗ ࢚ റ
࢜۽ ೖٜ ୶о 4 زदী ইమ/ݠפ ࣻ 4 ݗ ߈ࠂ പࣻ 4 ౠ ېझ݅ ࢶఖ 4 ঋҊ ইమਸ ࣻ 4 դ೧೧ ࠁח Ѫٜب ೖ۽ ٜ݅ ࣻ ח Ѫ ֢ೞ 4 ) 'ࠈ ےؒೞѱ ࢤࢿػ ܴਸ оҊ যਃ'' PyCon APAC 2016 63
) நܼఠ ܴ ےؒࢿ ౸ױ (/ݽ അ ಁఢ) ## நܼఠ
ܴ ߊ оמೠ ౸ױೞח गب ٘ # ܴਸ ݽ बࠅ۽ ߄Է(1о , 2о ݽ) # ) anything -> ‘21211211’ symbols = get_cv_symbols(char_name) # җ э ಁఢ ਵݶ ߊ оמ (प۽ח ؊ ন) if ‘2121’ or ‘2112’ or ‘1121’ or ‘22122’, … in symbols: can_pron = False else: can_pron = True PyCon APAC 2016 64
ഛೠ ߑߨ ইפ݅... ࠂਵ۽ ౸ױೞӝী ب ؽ PyCon APAC 2016
65
୶о ೖ۽ झযо ೱ࢚, Ӓ۞ա… 4 ಣӐ ഛب 96%۽ ೱ࢚.
ࣻח ֫ ಞ݅, 4 प ਊ೧ࠄ Ѿҗ 4 GMש ഛੋ җীࢲ য়ఐ Ԩ ա১ ! 4 DecisionTree Ҋੋ җ ޙઁ۽ ౸ױ PyCon APAC 2016 66
Random Forest۽ Ү 4 ݆ Decision Tree ܳ ઑೠ ঔ࢚࠶
పץ 4 ࣻ DTܳ ࠙ ण(=ӏച ബҗ) दఃҊ ైೞח ߑध 4 ࣻо ծইب উੋ Ѿҗ 4 DecisionTree - ࠛউೠ 96% RandomForest - উੋ 95% PyCon APAC 2016 67
Random Forest ण 4 ӝࠄਵ۽ Decision Tree৬ ࠺त 4 max_depth,
min_samples_leaf ݽ؛ ࠂبܳ ઑ. ѱ द೧ࢲ ઑӘঀ ఃਕࠄ 4 n_estimator 4 աޖ(DT)ܳ ݻ Ӓܖ बਸ Ѫੋ Ѿ ! 4 ցޖ ݶ णदр ӡҊ, ցޖ ਵݶ Ӓր DTо غযߡܿ PyCon APAC 2016 68
RF ਊ റ Ѿҗ 4 ഛبח 95% 4 ࠗೞѱ ҅
߉ח ࢎ۹о হب۾ 4 predict_probaܳ ࢎਊ೧ ஏ ഛܫب Ҋ 4 ഛܫ ֫(>70%) ஏ Ѿҗ݅ ನೣ 4 ৈӝࢲ 10~20%ب അਯ(Recall) ೞۅ ୶ 4 Ӓ۞ա, ب(Precision)ח… PyCon APAC 2016 69
100% ׳ࢿ GMש ࣻসਵ۽ Ѩష೧ न Ѿҗ… ! PyCon APAC
2016 70
ওਵפ ઁܳ... 4 2ѐਘৈী Ѧ ઁ 4 ోਸ ࢎਊೠ ߁
ࠗ࠙ ࢎۄ! ! 4 ӝ/ࣘਵ۽ ઁܳ ೧ঠ ബҗо PyCon APAC 2016 71
ଵҊ: ୭ઙ ೖ ਃب PyCon APAC 2016 72
ѐࢶ ߑೱ 4 Ѩػ Ѿҗܳ ਊ೧ ण ݽ؛ ѐࢶ 4
ࠈ ҅ী ೠ PIIܳ ࣻ೧فݶ नӏ ࠈ णী ਊೡ Ѫ 4 ઁ റ ߸ઙ ࠈ ݽפఠ݂ ਃ PyCon APAC 2016 73
റӝ PyCon APAC 2016 74
ו՛ 4 ؘఠ ࣻࠗఠ оҕ, ࠙ࢳө ݽٚ җਸ ॆਵ۽
! 4 Jupyter ֢࠘ਸ ాೠ ఐ࢝ ؘఠ ࠙ࢳ " 4 ؊ নೠ ࠙ঠী ӝ҅ णਸ ഝਊ оמೡ ٠ PyCon APAC 2016 75
ӝ҅ण बച 4 Ө ח ഝਊਸ ਤ೧ ӝࠄ ۿਸ ؊
ҕࠗೞ ! 4 જ Hypothesisܳ ٜ݅ ࣻ ѱ ػ 4 ୭ചܳ ೡ ࣻ ѱ ػ 4 ೞա ࢚ ঌҊ્ܻਸ ࢎਊ೧ ࠁ 4 SVM, Neural Net ١ নೠ ࠙ܨӝ 4 Super Learner ߑधਵ۽ ঔ࢚࠶ PyCon APAC 2016 76
ࣁਘ ൗ۞... ࢜۽ ۽Ӓ ࣻ/࠙ࢳ ജ҃ 4 RSync ߑध ->
Fluentd/Kinesis पदр ۽Ӓ ࣻ 4 gzipػ CSV -> Parquet ನݘਵ۽ S3 4 Columnar ߄ցܻ ನݘ, 30x ࣘب ೱ࢚ 4 MRJob -> PySpark 4 ъ۱ೠ ࠙ ܻ / Cache ӝמ(߈ࠂ णী ъ) 4 ױࣘ Spark ۞झఠ(20 VMs = 80য, 320GB ۔)۽ ਊ (दр 3000ਗ ب) PyCon APAC 2016 77
ઑ 4 ӝ҅ण ղо ೞ۰ח ੌী ೠ ౸ױ ! 4
য࠭ ౠࢿ ױࣽೞݶ ాੋ ߑߨਵ۽ оמ 4 ఐ࢝ ؘఠ ࠙ࢳਸ ా೧ ౠࢿਸ ݢ ঈೞ 4 নೠ ݽ؛/ೖܳ పझ೧ࠁ 4 ण ݽ؛ী ٮۄ ೖ ӏച/Үചо ਃೡ ࣻ ਵפ 4 ېझр Imbalance ޙઁী PyCon APAC 2016 78
٩۞? ӝ҅ण? 4 ٩۞ 4 Үೠ ೖ ূפয݂ ਃ হ
4 ݆ ಁ۞ఠ = ݆ ؘఠо ਃ 4 ӝ҅ण 4 ೖ স ਃೞ݅ 4 ಁ۞ఠ = ؘఠ۽ب ബ җ PyCon APAC 2016 79
࢚ 4 ؘఠ ূפয݂ য۰ 4 ؘఠ ഛࠁо о ਃ
4 झನۄܳ ߉ח ࠙ঠח য়۰ ݎ যف 4 োҳо ইפۄݶ ҷڣস/࢜ ؘఠঠ݈۽ ࠶ܖয়࣌ 4 ݽٚ ഥࢎী ؘఠ ࠙ࢳоо ਃೠ द 4 ஹೊఠо ݽٚ ݽ؛/߸ࣻ ઑਸ పझ ೡ ࣻ ݶ? ! PyCon APAC 2016 80
ਵ۽... ࢎ োҙ(Spurious Correlations) 4 पઁ۽ח োҙ হ݅, ח Ѫۢ
ࠁח ҃ 4 ؘఠী݅ ೞ ݈Ҋ, بݫੋਸ ೧ೞ! PyCon APAC 2016 81
хࢎפ. PyCon APAC 2016 82
ଵҊ ݂ 4 http://www.aladin.co.kr/shop/wproduct.aspx?ItemId=28946323 4 http://www.tylervigen.com/spurious-correlations 4 http://scikit-learn.org/stable/modules/tree.html 4 http://www.cimerr.net/conference/board/data/conference/1331626266/P15.pdf
4 http://stackoverflow.com/questions/20463281/- how-do-i-solve-overfitting-in-random-forest-- of-python-sklearn 4 http://stats.stackexchange.com/questions/131255/class-imbalance-in-supervised-machine-learning 4 https://www.quora.com/Is-Scala-a-better-choi- ce-than-Python-for-Apache-Spark 4 http://statkclee.github.io/data-science/data- -handling-pipeline.html 4 https://databricks.com/blog/2016/01/25/deep-- learning-with-spark-and-tensorflow.html- PyCon APAC 2016 83