Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
기계학습을 활용한 게임 어뷰징 검출
Search
JeongJu Kim
August 16, 2016
Technology
1
1.1k
기계학습을 활용한 게임 어뷰징 검출
PyConAPAC 2016에서 발표한 문서입니다.
JeongJu Kim
August 16, 2016
Tweet
Share
More Decks by JeongJu Kim
See All by JeongJu Kim
IPython과 Pandas를 활용한 게임데이터 분석 - PyConKR 2014
haje01
7
1.8k
Other Decks in Technology
See All in Technology
AIを前提に、業務を”再構築”せよ IVRyの9ヶ月にわたる挑戦と未来の働き方 (BTCONJP2025)
yueda256
1
740
AIと共に開発する時代の組織、プロセス設計 freeeでの実践から見えてきたこと
freee
4
710
プログラミング言語を書く前に日本語を書く── AI 時代に求められる「言葉で考える」力/登壇資料(井田 献一朗)
hacobu
PRO
0
160
ユーザーストーリー x AI / User Stories x AI
oomatomo
0
200
Error.prototype.stack の今と未来
progfay
1
150
アジャイル社内普及ご近所さんマップを作ろう / Let's create an agile neighborhood map
psj59129
1
130
手を動かしながら学ぶデータモデリング - 論理設計から物理設計まで / Data modeling
soudai
PRO
24
5.7k
Progressive Deliveryで支える!スケールする衛星コンステレーションの地上システム運用 / Ground Station Operation for Scalable Satellite Constellation by Progressive Delivery
iselegant
1
160
ZOZOTOWNカート決済リプレイス ── モジュラモノリスという過渡期戦略
zozotech
PRO
0
390
やり方は一つだけじゃない、正解だけを目指さず寄り道やその先まで自分流に楽しむ趣味プログラミングの探求 2025-11-15 YAPC::Fukuoka
sugyan
2
800
第65回コンピュータビジョン勉強会
tsukamotokenji
0
150
[mercari GEARS 2025] なぜメルカリはノーコードを選ばなかったのか? 社内問い合わせ工数を60%削減したLLM活用の裏側
mercari
PRO
0
110
Featured
See All Featured
Embracing the Ebb and Flow
colly
88
4.9k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
33
1.8k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.5k
How STYLIGHT went responsive
nonsquared
100
5.9k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
GitHub's CSS Performance
jonrohan
1032
470k
Side Projects
sachag
455
43k
Building Flexible Design Systems
yeseniaperezcruz
329
39k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
15k
[RailsConf 2023] Rails as a piece of cake
palkan
57
6.1k
Automating Front-end Workflow
addyosmani
1371
200k
Context Engineering - Making Every Token Count
addyosmani
10
390
Transcript
ӝ҅णਸ ഝਊೠ ѱ য࠭ Ѩ ӣ PyCon APAC 2016 PyCon
APAC 2016 1
ߊ ࣗѐ ӣ (
[email protected]
) : ѱ ѐߊ - NHN /
NPLUTO - 3D ূ / ѱ ۄ ѐߊ അ: ѱ ؘఠ ࣻ / ࠙ࢳ - Webzen NPlay - ۽Ӓ ನਕ؊, Pandas, Scikit-Learn, PySpark PyCon APAC 2016 2
ߊח 4 ӝ҅णী ೠ ӝࠄ ध ח ٜ࠙ਸ ࢚
4 ॆਸ ഝਊೠ ؘఠ ࠙ࢳҗ ӝ҅ण ࢎ۹ܳ ҕਬ 4 ѐߊҗ ࢲ࠺झী ӝ҅णਸ بੑೞח ҅ӝо غਵݶ פ PyCon APAC 2016 3
द زӝ 4 ѱ য࠭ ઁܳ 4 ਬ नҊ /
GM ݽפఠ݂ / ಁఢ ӝ۽ח ೠ҅ 4 ࢎۈ ѐੑ ୭ࣗചػ য࠭ ఐ दझమਸ ٜ݅ PyCon APAC 2016 4
ѱ য࠭ۆ? 4 “ӝദਵ۽ بೞ ঋ ߑधਵ۽ ѱ ࠁܳ
ദٙೞѢա ب ਸ ח ೯ਤ” ! 4 ࢎ۹ 4 ࢲ࠺झ ҳഅ࢚ ਸ ਊೠ ۨ 4 ೧ఊ ోਸ ࢎਊೠ ࠺࢚ ۨ 4 ହী بߓ۽ ҟҊ PyCon APAC 2016 5
ా҅৬ ఐ࢝ ؘఠ ࠙ࢳ PyCon APAC 2016 6
ࢶ, ా҅ 4 ా҅ח ೂࠗೞ ޅೠ ؘఠ৬ ஹೊ ਕ ജ҃ীࢲ
ߊ 4 ా҅ ٜ ؘఠ/҅ਸ ח ߑߨਸ োҳ 4 ৌঈೠ ജ҃ীࢲ ٜ݅যӝী, ؘఠীࢲب оܳ ߊѼೡ ࣻ 4 ӝࠄੋ ా҅ ध ѐߊ, ӝദ, ࢲ࠺झ ١ী ب ؽ PyCon APAC 2016 7
ఐ࢝ ؘఠ ࠙ࢳ 4 ؘఠী ऀযח ࠁܳ, 4 নೠ пب۽
ਃড, दпച ೧ࠁݴ ח җ ! 4 ೞח ؘఠח җࠗఠ 4 दझమ(WzDat) ѐߊ೧ ഝਊ " 4 Jupyter + Utility + Dashboard 4 https://github.com/haje01/wzdat 4 http://www.pycon.kr/2014/program/14 PyCon APAC 2016 8
ࢎ۹1 рױೠ ా҅ ই٣য۽ झಁݠ Ѩ PyCon APAC 2016 9
࢚ട 4 नӏ য়ೠ ѱ ହ ѱ ইమ ҟҊӖ۽ оٙ
! 4 ೧ ҅ਸ ઁ೧ب ߄۽ ࢜ ҅ਵ۽ ҟҊ ҅ࣘ 4 ࡅܲ ઁо ਃೞৈ, ӝ҅णਸ ೯ೞӝীח दр ࠗ PyCon APAC 2016 10
ਸ ਊೠ झಅ (Spam) 4 ѱ ղীࢲ ਵ۽ ݠפ/ইమ
౸ݒ ҟҊ 4 য࠭ח ۽Ӓ۔ ౸߹ਸ ݄ӝਤ೧ ݫ दܳ դةച PyCon APAC 2016 11
झಁݠ Ѩ 4 নೠ ߑߨ оמೞѷਵա, 4 োয ܻա ӝ҅णэ
Ҋә Ӕࠁ, 4 рױೠ ా҅ ই٣য۽ दب PyCon APAC 2016 12
ৡۄੋ ݫद ӡ ࠙ನ 4 ੌ߈ਵ۽ ۽Ӓ ӏ࠙ನܳ
ٮܲҊ ঌ ۰ઉ. 4 ইېח NPS Chat Corpus ݫद ӡ ࠙ನ PyCon APAC 2016 13
ѱ ղ ݫद ӡ ࠙ನ 4 ৡۄੋ җ ࠺तೞա
ખ ؊ فԁ ҃ ೱ 4 ౠ ӡ ݫदо (?) → झಅਵ۽ ഛੋ PyCon APAC 2016 14
ই٣য 4 ੌ߈ ਬ: ݫद ӡо নೞҊ, ࠼بо ݆ ঋ
4 झಁݠ: ݫद ӡо নೞ ঋҊ, ࠼بח ֫ 4 , যڃ ਬ ࠼بо ֫Ҋ ӡо নೞ ঋਵݶ झಁݠ PyCon APAC 2016 15
рױೠ Ѩ ҕध 4 ਬ ߹ പࣻ / ݫद
ӡ ઙܨ ࣻ 4 ࠺तೠ ӡ ݫदܳ ࠁյ ࣻ۾ ч ழ PyCon APAC 2016 16
࠙ܨ 4 spam_ratioо ӝળ ч ࢚ੋ Ѫਸ झಁݠ۽ р 4
ӝળ ч Ѿ ോܻझ౮ೞѱ... 4 ࢸ റ, ࠙ܨػ நܼఠ ݫद ഛੋਵ۽ ч ઑ PyCon APAC 2016 17
࠙ܨ റ ݫद ӡ ࠙ನ 4 ࠼بо ֫ ౠ ӡ
ݫद(= झಅ)о ܻ࠙غ PyCon APAC 2016 18
Ѿҗ ਊ 4 ҳഅ рױ೮݅, য়ఐ оמࢿ 4 ӝળ
чਸ ֫ѱ ই न܉بܳ ֫ 4 Ѿҗܳ оҊ ઁ PyCon APAC 2016 19
ѐࢶ ߑೱ 4 ӝળ ч Ѿਸ ખ ؊ җੋ ߑߨਵ۽
4 োয ܻ ӝࣿ(NLP) بੑ 4 ױয߹ ࠼ب(Ziff’s Law)৬ ਃب(TF-IDF) Ҋ۰ 4 ӝ҅ण ঌҊ્ܻ ਊ PyCon APAC 2016 20
ӝ҅ण ࣗѐ PyCon APAC 2016 21
ӝ҅णਸ ॳח ਬ 4 ֢۱ਵ۽ ҡଳ Ѿҗޛ 4 নೠ
ޙઁী ೠ ੌ߈ੋ ࣛܖ࣌ 4 ࣻ ౠࢿ(ೖ)ਸ زदী Ҋ۰ೡ ࣻ 4 ؘఠ ߸زী ъೣ(ъѤࢿ) PyCon APAC 2016 22
࠙ܨ৬ ഥӈ 4 ӝ҅ण ѱ ࠙ܨ (Classification)৬ ഥӈ (Regression)۽ ա
4 ࠙ܨ - ઙܨܳ ஏ ೞח Ѫ 4 ഥӈ - োࣘػ чਸ ஏ ೞח Ѫ 4 য࠭ Ѩ ࠙ܨী ࣘೣ PyCon APAC 2016 23
ب णҗ ਯ ण 4 ب ण(Supervised Learning) 4 ӝઓ
҃ী ೧ ࠙ܨػ ࢠ ؘఠо ਸ ٸ 4 ਯ ण(Unsupervised Learning) 4 ࠙ܨػ ࢠ ؘఠо হਸ ٸ 4 ࠗ࠙ ؘఠח ࠙ܨغয ঋ → ಽযঠೡ ޙઁ PyCon APAC 2016 24
ӝ҅ण ঌҊ્ܻٜ 4 ӝࠄ 4 ܻפয/۽झ౮ ܻӒۨ࣌(Linear/Logistic Regression) 4 Ѿ
ܻ(Decision Tree) 4 Ҋә 4 ےؒ ನۨझ(Random Forest) 4 SVM(Support Vector Machine) 4 ੋҕ न҃ݎ(Neural Network) PyCon APAC 2016 25
ঌҊ્ܻ ࢶఖ? 4 ੌ߈ਵ۽ Ҋә ঌҊ્ܻ ؊ ࠂೠ ݽ؛ ण
оמ 4 Ӓ۞ա, Ҋә ঌҊ્ܻ ޖઑѤ જ Ѫ ইש 4 ण Ѿҗܳ ࢎۈ ೧ೞӝীח ӝࠄ ঌҊ્ܻ જ PyCon APAC 2016 26
ஏী ೠ ಣо 4 ഛࢿী ೠ о ਃ ! 4
Q: ਬ 100ݺ 2ݺ ח য࠭ܳ Ѩೞ۰ ೠ. पࣻ۽ ݽف ࢚ ਬ۽ ౸ױ೮ਸ ٸ ഛبח? 4 A: 100ݺ 2ݺ ౣ۷ਵפ… 98% !?#@ PyCon APAC 2016 27
ஏ ױਤ 4 ب(Precision) അਯ(Recall)җ ١ নೠ ױਤ 4 ب:
Ѫ ݃ա য࠭ੋо? 4 അਯ: য࠭ ݃ա ওחо? 4 ؘఠо ࠛӐഋ(Imbalance)ੌٸח ౠ ب৬ അਯਸ ೣԋ Ҋ۰೧ঠ 4 খ ҃ח അਯ 0 PyCon APAC 2016 28
P/R Curve ৬ AUC જ ࠙ܨӝח? PyCon APAC 2016 29
ࢎ۹2 ӝ҅णਵ۽ ߁ Ѩ PyCon APAC 2016 30
࢚ട 4 ۄ࠳ ѱীࢲ пઙ ೧ఊ ోਸ ࢎਊೠ ߁ ۨо
ഝѐ ! 4 ߁: ѱ ղ ചܳ ࠺ ࢚ੋ ߑߨਵ۽ णٙ 4 ࠈ ౠࢿਸ ೞա ل۽ ౠೞӝ য۰ → ӝ҅ण ਃ PyCon APAC 2016 31
ण ߑध ࢶఖ 4 Ҷ ۡ֔/٩۞ਵ۽ ೡ ਃח হח ٠…
4 җѢ ۽Ӓо غҊ Ҋ, 4 ஏীࢲ ӝઓ য࠭ நܼఠ ܻझܳ оҊ ! → ӝ҅ण, ౠ ب ण оמ! 4 Decision Tree ߑध ب णਵ۽ Ѿ PyCon APAC 2016 32
ળ࠺ җ 1. ۽Ӓ ࣻ ࢚క ഛੋ 2. ۽Ӓ ҳઑ/
ঈ 3. णਸ ਤೠ ೖ(Feature) ୶ PyCon APAC 2016 33
ӝ҅णب ۽Ӓ ࣻࠗఠ 4 ۽Ӓܳ ҅ਵ۽ ݽਵח Ѫب औ ঋ
4 ࠙ࢳ/णী Ѧܻח दр 10~20% ب 4 ؘఠܳ ݽਵҊ оҕೞחؘ ࠗ࠙ दр Ѧܽ. 4 ۽Ӓ ഋध оә Ӓ۽ ࢎਊ (झౚ٣য়ܳ ਤ೧… !) 4 ۽Ӓܳ ࠙ܨ೧ (ࢲߡ/۽Ӓ ઙܨ, द ߹۽) 4 ۄ٘ झషܻ(S3) ୶ୌ ☁ PyCon APAC 2016 34
ਦب ࢲߡীࢲ ۽Ӓ ࣻೞӝ 4 ѱ ࢲߡח ࠗ࠙ ਦب ӝ߈
4 য় ࣗझ જ ోٜ(fluentd, logstash ١)ਸ ॳҊ रਵա 4 ਦب ࢲߡী ࢸо औ ঋҊ, ੌࠗ ӝמ ࠗ 4 ѐߊ ! 4 https://github.com/haje01/wdfwd 4 ࢲߡী թ ۽Ӓ ੌਸ RSync۽ زӝೞѢա 4 ѱ DBী ࣘೞৈ Dump റ ࣠ PyCon APAC 2016 35
۽Ӓо ࣻ غਵݶ ೖܳ ٜ݅ 4 ೖ(Feature, ౠࢿ): ण ࢚
ౠਸ ࢸݺ೧ח ч 4 ) чਸ ஏೞח ҃ ! → ӝ, ߑೱ, ജ҃, Үా, ಞदࢸ ١ ೖ PyCon APAC 2016 36
ೖ ѐߊ(Feature Engineering) 4 (࠺)ഋ ؘఠীࢲ ೖܳ Ҋ ࢤࢿೞח স
4 ܲ ೖٜী ղػ ೖܳ ইղӝب ೣ 4 ٸ۽ח ࠂೠ ٘о ਃ(SQL۽ח ൨ٝ) 4 3ѐਘ ࠙ ۽Ӓীࢲ ೞنਸ ా೧ ೖ ࢤࢿ PyCon APAC 2016 37
ೞنਸ ॄঠ݅ ೞա? 4 ؘఠо Bigೞ ঋਵݶ ਃ হ 4
न… 4 ߓ Jobਸ য়ۖزউ جܻѢա 4 ӝਵ۽ ETLਸ ా೧ DBী ֍যفח җ ਃೡ ࣻ 4 ࠺ഋ/ਊ ؘఠীࢲ ࠼ߣೠ ೖ ѐߊਸ ೠݶ જ PyCon APAC 2016 38
যڌѱ ॄঠೞա? 4 ೞن ۞झఠܳ ҳ୷ೞৈ ࢎਊೡ ࣻب ਵա,
ࣇҗ ਊ য۰ 4 ۄ٘ ࢲ࠺झীࢲ ઁҕೞח ೞن ࢲ࠺झܳ ਊ ! - AWS EMR(Elastic Map Reduce) PyCon APAC 2016 39
AWSח ࠺ऱ ঋա? 4 ୭ച ೞݶ ࠺ऱ ঋ ! 4
ਃೡ ٸ݅ ॳח ױࣘ ۞झఠ(Transient Cluster)۽ ਊ 4 Task ֢٘ח ҃ݒ ߑध Spot Instance۽ 4 m4.xlarge(4 vCPU, 16 GiB RAM ): दр 0.036$ (ࢲ ܻ, 2016-08-09 ӝળ) PyCon APAC 2016 40
AWS EMR ۞झఠ द ചݶ PyCon APAC 2016 41
ೞنਸ ਤೠ ۽Ӓ оҕ 4 ೞن ੌ(< 100MB)ٜ ݆
Ѫী ஂড 4 ੌٜ ߽, ࣗ, ୷ೡ ਃ 4 ݃ٶೠ ోਸ ޅ೧ ѐߊ ! 4 https://github.com/haje01/mersoz 4 ߄Ո ੌ݅ স, ઓ ҙ҅ܳ Ҋ۰ೠ ߽۳ ܻ PyCon APAC 2016 42
ݠ, ࣗ & ୷ റ S3ী ػ ۽Ӓ PyCon APAC
2016 43
ೞن MapReduce ٬ - mrjob 4 Yelpীࢲ ݅ٚ Python ಁః
4 ೞن झܿਸ ਊ೧ ॆਵ۽ MR ٬ 4 ۽ஸীࢲ ࢠ ؘఠ۽ ѐߊೠ റ, EMRী ৢܿ ! 4 प೯ ࣘبח Javaߡ ࠁ ખ וܻ݅ ѐߊ ࣘبо ࡅܴ PyCon APAC 2016 44
from mrjob.job import MRJob import re WORD_RE = re.compile(r"[\w']+") class
MRWordFreqCount(MRJob): def mapper(self, _, line): # ۽Ӓ ੌ п ۄੋ for word in WORD_RE.findall(line): # ݽٚ ױযী ೧ yield word.lower(), 1 # 'ױয', 1 ߈ജ def combiner(self, word, counts): # ֢٘ Ѿҗܳ ஂ yield word, sum(counts) def reducer(self, word, counts): # ۞झఠ Ѿҗܳ ஂ yield word, sum(counts) if __name__ == '__main__': MRWordFreqCount.run() PyCon APAC 2016 45
दझమ ҳࢿب PyCon APAC 2016 46
അട ঈ 4 ӝ҅णਸ ਤ೧ 4 GM ઁೞח ӔѢ(=ೖ)৬ 4
ઁػ நܼఠ ܻझܳ ਃ PyCon APAC 2016 47
ೖ ࢤࢿ 4 ۽Ӓীࢲ நܼఠ ӝળਵ۽ ҳೣ 4 Үೠ
ೖࠁח নೠ ೖܳ 4 যରೖ ࠂਵ۽ ౸ױ 4 ୡӝীח ૣ दрী ೧, উചغݶ ӡѱ PyCon APAC 2016 48
ୡӝী ࡳইࠄ ೖٜ 4 ۽Ӓੋ ࣻ 4 ۨ दр 4
۽Ӓ ইਓ ࠛ࠙ݺೠ ҃о ݆ 4 ࣁ࣌ ইਓ بੑ: 5࠙ ⏱ 4 ইమ/ݠפ णٙ ࣻ 4 ௮झ ઙܐ ࣻ 4 NPC/PC р ై ࣻ PyCon APAC 2016 49
ೖ ఋੑ? 4 ѱ पࣻ ഋ, పҊܻ ഋ, ࠛܽ(Boolean) ഋਵ۽
աׇ 4 оә पࣻ ഋਵ۽ ాੌೞח Ѫ ߄ۈ 4 Bool 0, 1۽ 4 పҊܻ ఋੑ OneHotEncoderܳ ࢎਊ೧ पࣻഋਵ۽ PyCon APAC 2016 50
ٜ݅য ೖ 4 ױࣽ ఫझ (.txt) ੌ 4 நܼఠݺ
+ ೖ ߓৌ ഋध PyCon APAC 2016 51
ӝ҅ण ೯ PyCon APAC 2016 52
ӝ҅ण о߶ 4 ୭ઙ ೖ ੌ ӝо Ҋ, ӝ҅ण
ࣻ೯ب о߶ ಞ 4 ۽ஸ PCীࢲ ࣻ೯ 4 ୶ୌ दझమۢ ݽٚ ؘఠܳ ࠊঠೞח ण ޖѢ Ѫ 4 ݽ؛ਸ ࢶఖೞҊ ୭ ೞಌ ಁ۞ఠܳ Ѿೞח Ѫ җઁ 4 নೠ ࣇਵ۽ ৈ۞ߣ प೧ࠊঠ 4 ࠙ दझమਸ ഝਊೞח ҃ب... PyCon APAC 2016 53
যڃ ঌҊ્ܻ ݽ؛ਸ ࢶఖೡ Ѫੋо? 4 द рױೠ Ѫਵ۽ 4
࠺तೠ ࢎ۹ ࢶ೯ োҳо ਵݶ ଵҊೞ 4 AUCա ROCܳ ాೠ ݽ؛ ಣо ߂ ࢶఖ PyCon APAC 2016 54
Decision Tree۽ द 4 ࠂೞ ঋҊ ౸ױ җ ೧о ਊ
4 ॆ Scikit-Learn ಁః Ѫਸ ࢎਊ 4 নೠ ӝ҅ण ঌҊ્ܻਸ प ઁҕ 4 ੋఠಕझо ాੌغয য ݽ؛ Үо ਊ 4 ೖ(X)৬ য࠭ ৈࠗ(y)ܳ ֍Ҋ ण 4 DTח ೖ ӏച ਃ হয ಞܻ PyCon APAC 2016 55
DT ࢎਊ (ࠠԢ ࠙ܨ) from sklearn.datasets import load_iris from
sklearn import tree iris = load_iris() clf = tree.DecisionTreeClassifier() clf = clf.fit(iris.data, iris.target) >>> clf.predict(iris.data[:1, :]) array([0]) PyCon APAC 2016 56
PyCon APAC 2016 57
Decision Tree ण җ 1. ೖ ੌীࢲ ӝઓ য࠭ ೖܳ
Ҋ 2. زࣻ ࢚ ਬ ೖ ҳೣ 4 Under Sampling 3. ؘఠܳ Train/Test ࣇਵ۽ ա־Ҋ 4. ӝࠄ ಁ۞ఠ۽ ण द PyCon APAC 2016 58
ୡӝ Ѿҗ 4 ಣӐ ഛب 80% ب 4 Binary Class
࠙ܨ ҃ ࣻо ੜ աয়ח ಞ 4 աࢁ ঋѪ э݅, 4 ஏ Ѿҗо ઁ ӔѢ۽ ॳੋח ীࢲ ݆ ࠗ PyCon APAC 2016 59
ഛبܳ ৢܻ 4 Үର Ѩૐ(Cross Validation)ਸ ਤ೧ ؘఠ ࣇਸ ܻ࠙
ೞҊ 4 GridSearchCVܳ ా೧ ୭ ೞಌ ಁ۞ఠܳ 4 ಣӐ ഛب 91%۽ ೱ࢚ 4 যڃ ӝળਵ۽ ౸ױೞח ೠ ߣ ࠁҊ र tree.export_graphviz۽ Ӓ۰ࠆ PyCon APAC 2016 60
PyCon APAC 2016 61
Ѿ ܻܳ ࠁפ... 4 णػ ݽ؛ যڃ ӝળਵ۽ ౸ױೞח ঌ
ࣻ → নೠ ҵ ࢎۈٜী ҕਬ оמ ! 4 ೞࠗ۽ ղ۰т ࣻ۾ ࠂ೧ח ޙઁ 4 DTח җ(Overfitting)غӝ औӝী, Depthо ցޖ Ө ঋѱ PyCon APAC 2016 62
ৈӝࢲ ؊ ࢚ ࣻо ৢۄо ঋ 4 GMשҗ ࢚ റ
࢜۽ ೖٜ ୶о 4 زदী ইమ/ݠפ ࣻ 4 ݗ ߈ࠂ പࣻ 4 ౠ ېझ݅ ࢶఖ 4 ঋҊ ইమਸ ࣻ 4 դ೧೧ ࠁח Ѫٜب ೖ۽ ٜ݅ ࣻ ח Ѫ ֢ೞ 4 ) 'ࠈ ےؒೞѱ ࢤࢿػ ܴਸ оҊ যਃ'' PyCon APAC 2016 63
) நܼఠ ܴ ےؒࢿ ౸ױ (/ݽ അ ಁఢ) ## நܼఠ
ܴ ߊ оמೠ ౸ױೞח गب ٘ # ܴਸ ݽ बࠅ۽ ߄Է(1о , 2о ݽ) # ) anything -> ‘21211211’ symbols = get_cv_symbols(char_name) # җ э ಁఢ ਵݶ ߊ оמ (प۽ח ؊ ন) if ‘2121’ or ‘2112’ or ‘1121’ or ‘22122’, … in symbols: can_pron = False else: can_pron = True PyCon APAC 2016 64
ഛೠ ߑߨ ইפ݅... ࠂਵ۽ ౸ױೞӝী ب ؽ PyCon APAC 2016
65
୶о ೖ۽ झযо ೱ࢚, Ӓ۞ա… 4 ಣӐ ഛب 96%۽ ೱ࢚.
ࣻח ֫ ಞ݅, 4 प ਊ೧ࠄ Ѿҗ 4 GMש ഛੋ җীࢲ য়ఐ Ԩ ա১ ! 4 DecisionTree Ҋੋ җ ޙઁ۽ ౸ױ PyCon APAC 2016 66
Random Forest۽ Ү 4 ݆ Decision Tree ܳ ઑೠ ঔ࢚࠶
పץ 4 ࣻ DTܳ ࠙ ण(=ӏച ബҗ) दఃҊ ైೞח ߑध 4 ࣻо ծইب উੋ Ѿҗ 4 DecisionTree - ࠛউೠ 96% RandomForest - উੋ 95% PyCon APAC 2016 67
Random Forest ण 4 ӝࠄਵ۽ Decision Tree৬ ࠺त 4 max_depth,
min_samples_leaf ݽ؛ ࠂبܳ ઑ. ѱ द೧ࢲ ઑӘঀ ఃਕࠄ 4 n_estimator 4 աޖ(DT)ܳ ݻ Ӓܖ बਸ Ѫੋ Ѿ ! 4 ցޖ ݶ णदр ӡҊ, ցޖ ਵݶ Ӓր DTо غযߡܿ PyCon APAC 2016 68
RF ਊ റ Ѿҗ 4 ഛبח 95% 4 ࠗೞѱ ҅
߉ח ࢎ۹о হب۾ 4 predict_probaܳ ࢎਊ೧ ஏ ഛܫب Ҋ 4 ഛܫ ֫(>70%) ஏ Ѿҗ݅ ನೣ 4 ৈӝࢲ 10~20%ب അਯ(Recall) ೞۅ ୶ 4 Ӓ۞ա, ب(Precision)ח… PyCon APAC 2016 69
100% ׳ࢿ GMש ࣻসਵ۽ Ѩష೧ न Ѿҗ… ! PyCon APAC
2016 70
ওਵפ ઁܳ... 4 2ѐਘৈী Ѧ ઁ 4 ోਸ ࢎਊೠ ߁
ࠗ࠙ ࢎۄ! ! 4 ӝ/ࣘਵ۽ ઁܳ ೧ঠ ബҗо PyCon APAC 2016 71
ଵҊ: ୭ઙ ೖ ਃب PyCon APAC 2016 72
ѐࢶ ߑೱ 4 Ѩػ Ѿҗܳ ਊ೧ ण ݽ؛ ѐࢶ 4
ࠈ ҅ী ೠ PIIܳ ࣻ೧فݶ नӏ ࠈ णী ਊೡ Ѫ 4 ઁ റ ߸ઙ ࠈ ݽפఠ݂ ਃ PyCon APAC 2016 73
റӝ PyCon APAC 2016 74
ו՛ 4 ؘఠ ࣻࠗఠ оҕ, ࠙ࢳө ݽٚ җਸ ॆਵ۽
! 4 Jupyter ֢࠘ਸ ాೠ ఐ࢝ ؘఠ ࠙ࢳ " 4 ؊ নೠ ࠙ঠী ӝ҅ णਸ ഝਊ оמೡ ٠ PyCon APAC 2016 75
ӝ҅ण बച 4 Ө ח ഝਊਸ ਤ೧ ӝࠄ ۿਸ ؊
ҕࠗೞ ! 4 જ Hypothesisܳ ٜ݅ ࣻ ѱ ػ 4 ୭ചܳ ೡ ࣻ ѱ ػ 4 ೞա ࢚ ঌҊ્ܻਸ ࢎਊ೧ ࠁ 4 SVM, Neural Net ١ নೠ ࠙ܨӝ 4 Super Learner ߑधਵ۽ ঔ࢚࠶ PyCon APAC 2016 76
ࣁਘ ൗ۞... ࢜۽ ۽Ӓ ࣻ/࠙ࢳ ജ҃ 4 RSync ߑध ->
Fluentd/Kinesis पदр ۽Ӓ ࣻ 4 gzipػ CSV -> Parquet ನݘਵ۽ S3 4 Columnar ߄ցܻ ನݘ, 30x ࣘب ೱ࢚ 4 MRJob -> PySpark 4 ъ۱ೠ ࠙ ܻ / Cache ӝמ(߈ࠂ णী ъ) 4 ױࣘ Spark ۞झఠ(20 VMs = 80য, 320GB ۔)۽ ਊ (दр 3000ਗ ب) PyCon APAC 2016 77
ઑ 4 ӝ҅ण ղо ೞ۰ח ੌী ೠ ౸ױ ! 4
য࠭ ౠࢿ ױࣽೞݶ ాੋ ߑߨਵ۽ оמ 4 ఐ࢝ ؘఠ ࠙ࢳਸ ా೧ ౠࢿਸ ݢ ঈೞ 4 নೠ ݽ؛/ೖܳ పझ೧ࠁ 4 ण ݽ؛ী ٮۄ ೖ ӏച/Үചо ਃೡ ࣻ ਵפ 4 ېझр Imbalance ޙઁী PyCon APAC 2016 78
٩۞? ӝ҅ण? 4 ٩۞ 4 Үೠ ೖ ূפয݂ ਃ হ
4 ݆ ಁ۞ఠ = ݆ ؘఠо ਃ 4 ӝ҅ण 4 ೖ স ਃೞ݅ 4 ಁ۞ఠ = ؘఠ۽ب ബ җ PyCon APAC 2016 79
࢚ 4 ؘఠ ূפয݂ য۰ 4 ؘఠ ഛࠁо о ਃ
4 झನۄܳ ߉ח ࠙ঠח য়۰ ݎ যف 4 োҳо ইפۄݶ ҷڣস/࢜ ؘఠঠ݈۽ ࠶ܖয়࣌ 4 ݽٚ ഥࢎী ؘఠ ࠙ࢳоо ਃೠ द 4 ஹೊఠо ݽٚ ݽ؛/߸ࣻ ઑਸ పझ ೡ ࣻ ݶ? ! PyCon APAC 2016 80
ਵ۽... ࢎ োҙ(Spurious Correlations) 4 पઁ۽ח োҙ হ݅, ח Ѫۢ
ࠁח ҃ 4 ؘఠী݅ ೞ ݈Ҋ, بݫੋਸ ೧ೞ! PyCon APAC 2016 81
хࢎפ. PyCon APAC 2016 82
ଵҊ ݂ 4 http://www.aladin.co.kr/shop/wproduct.aspx?ItemId=28946323 4 http://www.tylervigen.com/spurious-correlations 4 http://scikit-learn.org/stable/modules/tree.html 4 http://www.cimerr.net/conference/board/data/conference/1331626266/P15.pdf
4 http://stackoverflow.com/questions/20463281/- how-do-i-solve-overfitting-in-random-forest-- of-python-sklearn 4 http://stats.stackexchange.com/questions/131255/class-imbalance-in-supervised-machine-learning 4 https://www.quora.com/Is-Scala-a-better-choi- ce-than-Python-for-Apache-Spark 4 http://statkclee.github.io/data-science/data- -handling-pipeline.html 4 https://databricks.com/blog/2016/01/25/deep-- learning-with-spark-and-tensorflow.html- PyCon APAC 2016 83