Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
機械学習チームにおけるソフトウェアエンジニア〜役割、キャリア /devsum-2018-summer
Search
Takahiko Ito
July 27, 2018
8
11k
機械学習チームにおけるソフトウェアエンジニア〜役割、キャリア /devsum-2018-summer
https://event.shoeisha.jp/devsumi/20180727
Takahiko Ito
July 27, 2018
Tweet
Share
More Decks by Takahiko Ito
See All by Takahiko Ito
Elasticsearch における類似度ベクトル検索のベストプラクティスを求めて/es-vector-search
takahiko03
9
6.1k
pfm
takahiko03
0
1.1k
機械学習プロジェクトを頑健にする施策 ML Ops Study #2
takahiko03
12
4.5k
Cookiecutter Template for Data Scientists Working in Docker Containers
takahiko03
2
2.3k
Cookiecutter for ML experiments with Docker
takahiko03
0
1.1k
日本語の表記ゆれ 解決方法の検討と実装
takahiko03
2
2.1k
Featured
See All Featured
The World Runs on Bad Software
bkeepers
PRO
65
11k
Music & Morning Musume
bryan
46
6.2k
How GitHub (no longer) Works
holman
310
140k
Building Better People: How to give real-time feedback that sticks.
wjessup
364
19k
The Cult of Friendly URLs
andyhume
78
6k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
47
2.1k
Fireside Chat
paigeccino
34
3k
Rails Girls Zürich Keynote
gr2m
94
13k
StorybookのUI Testing Handbookを読んだ
zakiyama
27
5.3k
Navigating Team Friction
lara
183
14k
Speed Design
sergeychernyshev
25
620
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
356
29k
Transcript
ػցֶशνʔϜʹ͓͚Δ ιϑτΣΞΤϯδχΞ ׂɺΩϟϦΞ ҏ౻ܟ
ࣗݾհ • ΫοΫύουגࣜձࣾͰಇ͍͍ͯΔι ϑτΣΞΤϯδχΞ • ത࢜ʢֶʣ • TwitterΞΧϯτ: takahi_i •
Φʔϓϯιʔεɿ RedPen 2
ུྺ 3 ݚڀऀ KDD PKDD ιϑτΣΞΤϯδχΞ ࢄϑϨʔϜϫʔΫɿHadoop ݕࡧΤϯδϯɿSolrɺESɺSedueɺFAST ESP ݚڀपΓɿػցֶशɺϨίϝϯυɺNLP
࠷ۙ ػցֶश άϧʔϓ 2007 2017
ຊͷτϐοΫ • ػցֶशνʔϜʹ͓͚ΔιϑτΣΞΤϯδχΞͷׂ • Webاۀʹ͓͚ΔػցֶशͷΩϟϦΞܗ 4
ຊͷτϐοΫ • ػցֶशνʔϜʹ͓͚ΔιϑτΣΞΤϯδχΞͷׂ • Webاۀʹ͓͚ΔػցֶशͷΩϟϦΞܗ 5
४උɿػցֶशϓϩδΣΫτͷಛ • ௨ৗͷιϑτΣΞ։ൃͱҧ͍ • ίʔυ͚ͩͰ݁͠ͳ͍ • ίʔυΛಡΜͰৼΔ͍͕͔Βͳ͍ʢϞσϧʹৼΔ͍͕Ӆṭ͞Ε͍ͯΔʣ • ೖྗσʔλʹґଘ͢Δʢ͞Βʹೖྗͷ͕มԽ͢Δ͜ͱ͕͋Δʣ •
ݟΛڞ༗ʢ࣮Λཧղ͢ΔϝϯόΛෳἧ͑Δʣͯ҆͠શੑΛ֬อ͢Δͷ͕͍͠ • ಛʹΞϧΰϦζϜࣗମ͕͍͠߹ ࣮έΞ͠ͳ͚ΕͳΒͳ͍෦͕ଟ͍ϓϩδΣΫτͱ͍͑Δ 6
࠷ۙͷྲྀΕ • ػցֶशͷಋೖɺཧͷ͠͞ΛΤ ϯδχΞϦϯάͰରॲ • σʔλαΠΤϯςΟετΛαϙʔτ ͢ΔνʔϜฤ • MLΤϯδχΞɺσʔλΤϯδχΞ etc…
• AIɺػցֶशͰ׆༂͢Διϑτ ΣΞΤϯδχΞ͕૿Ճ 7 • ͰɺػցֶशνʔϜͰιϑτΣ ΞΤϯδχΞͲ͏͍ͬͨߩݙΛ ͢Δ͖ͳͷ͔ʁ • ϓϩδΣΫτͷϥΠϑαΠΫϧʹ ͢Δͱׂ͕ݟ͑ͯ͘Δ
४උɿػցֶशϓϩδΣΫτͷαΠΫϧ ̏ͭͷεςʔδ͔ΒͳΔ 1. ࣮ݧɿ Jupyter Notebook Λར༻ ͯ͠୳ࡧతͳࢼߦࡨޡ 2. ίʔυཧɿϦϑΝΫλϦϯάɺ
ϥΠϒϥϦԽɺCI 3. σϓϩΠ: αʔϏεԽɺCDɺ ࢹ 8
४උɿػցֶशϓϩδΣΫτͷαΠΫϧ 9 • ࣮ݧɺίʔυཧɺσϓϩΠ͕άϧά ϧճΔ • αΠΫϧΛճͭͭ͠ΑΓྑ͍γεςϜʹ ਫ਼্ɺγεςϜͷؤ݈Խ
ѱ͍ϓϩδΣΫτɿαΠΫϧ͕ճΒͣ Ϟσϧ͕ݻఆ αΠΫϧ͕ճΒͳ͍ͱ • ਫ਼্͕͠ͳ͍ • σϓϩΠίετ͕ େ͖͍ 10
ػցֶशνʔϜͷΤϯδχΞʹٻΊΒ ΕΔ͜ͱ • ϓϩδΣΫτͷαΠΫϧΛߴɺ ҆શʹճͤΔڥͷඋ • ֤εςʔδͷΛɺΤϯδχ ΞϦϯάͰղܾ 11 ҎԼɺ֤εςʔδʹ͓͚Δ
ͱղܾํ๏ʹ͍ͭͯղઆͯ͠Ώ͘
࣮ݧεςʔδʹ͓͚Δ ೋͭଘࡏ͢Δ 1. σʔλͷऔಘ 2. ܭࢉػϦιʔε 12
࣮ݧεςʔδʹ͓͚Δ ೋͭଘࡏ͢Δ 1. σʔλͷऔಘ 2. ܭࢉػϦιʔε
σʔλऔಘʹؔ࿈͢ΔΞϯνύλʔϯ σʔλΛ؆୯ʹऔಘͰ͖ΔڥΛߏங͢ΔલʹϦαʔνϟʔɺσʔλ αΠΤϯςΟετΛେྔʹޏ͏ 14 ҙ • σʔλαΠΤϯςΟετͷΞτϓοτσʔλͷऔಘίετʹґଘ͢Δ • σʔλऔಘίετ͕େ͖͘ർฐ͢ΔڥͰσʔλαΠΤϯςΟετेͳ ݁ՌΛͤͳ͍
σʔλʹΞΫηεͰ͖ͳ͍σʔλαΠ ΤϯςΟετͱԿऀ͔ʁ ͍ΘʮٰαʔϑΝʔʯ 15 αʔϑΝʔʹਫΛಧ͚Δඞཁ͕͋Δʂ
σʔλऔಘ • σʔλͷछྨɿ 1.σʔλϕʔεͷςʔϒϧ܈ʢϢʔβʣ • େ͖͞ɿϝΨόΠτʙΪΨόΠτ 2. ϩά • େ͖͞ɿΪΨόΠτʙϖλόΠτ
• σʔλαΠΤϯςΟετ͕྆ํͷσʔλʹࣗͰΞΫηεͰ͖Δඞཁ͕͋Δ 16
σʔλੳج൫ • ڊେͳσʔλΛอ࣋͠ɺ؆୯ʹநग़Ͱ͖Δ ϓϥοτϑΥʔϜ • બࢶͨ͘͞Μɿ • ࣗࣾӡ༻ɿHiveʢHadoopʣɺSparkɺ Presto •
ϗεςΟϯάαʔϏεɿBigQueryɺ RedshiftɺTreasureData 17
ࣾͷ͋ΒΏΔσʔλΛੳج൫ʹࡌ ͤΔ • ੳج൫ͷબఆɺ࡞͕ऴྃ͠ ͨΒσʔλΛੳج൫ʹࡌͤΔ • ϩάऩूπʔϧΛར༻ͯࣗ͠ಈ ͰσʔλΛೖ͢ΔΈΛ ͑Δ •
ਓखʹΑΔσʔλೖ 18
ฐࣾࣄྫɿੳج൫ • DWHνʔϜ͕୲ʢػցֶशνʔϜͱผʹଘࡏʣ • ඞཁͳσʔλΛSQLͰ؆୯ʹऔಘͰ͖ΔڥΛඋͯ͠Β͍ͬͯ Δ • ৄ͘͠ɿ https://www.slideshare.net/mineroaoki/cookpad- techconf-2016-dwh
19
࣮ݧεςʔδʹ͓͚Δ ೋͭଘࡏ͢Δ 1. σʔλͷऔಘ 2. ܭࢉػϦιʔε 20
ܭࢉػϦιʔεͷރׇ • ෳਓͰ୯ҰαʔόʹϩάΠϯͯ͠࡞ۀ • Ϧιʔε͕Γͳ࣮ͯ͘ݧ͕ਐ·ͳ͍ɻɻɻ 21
ेͳܭࢉػϦιʔεͷ֬อ • ػցֶशνʔϜͷϝϯόશһ͕ շదʹ࣮ݧͰ͖ΔڥΛ͑Δ • GPU͕Γͳ͍ͱ͔ 22
ฐࣾࣄྫɿܭࢉػϦιʔεཧ • ɹɹ @ayemos_y ࢯʹΑΔܭࢉػϦιʔε ͷཧ༻SlackΣΞ • Slack্Ͱܭࢉثͷ֬อΛ͓ئ͍͢ΔͱEC2 ΠϯελϯεΛ࡞ͬͯ͘ΕΔɻ •
ܭࢉػ͕ΘΕͳ͘ͳΔͱࣗಈͰམͱͯ͠ ͘ΕΔʢ͓ۚͷઅʣ • ৄ͘͠ɿhttps://techlife.cookpad.com/ entry/2017/10/26/174345 23
ίʔυཧεςʔδʹ͓͚Δ ̎ͭͷ͕͋Δ 1.ίʔυ͕ཧղͰ͖ͳ͍ 2.ϙʔλϏϦςΟ͕ແ͍ 24
ίʔυཧεςʔδʹ͓͚Δ ̎ͭͷ͕͋Δ 1. ίʔυ͕ཧղͰ͖ͳ͍ 2. ϙʔλϏϦςΟ͕ແ͍ 25
࣮ݧεΫϦϓτ͕ཧղͰ͖ͳ͍ • ঢ়گɿͳΜ͔ಈ࡞͍ͯ͠ΔΑ͏͕ͩɺϞσϧΛੜ͍ͯ͠Δίʔυ͕ཧղͰ ͖ͳ͍ • ྫɿJupyter Notebook Λͦͷ··ίϐϖͨ͠εΫϦϓτ • ػցֶशΞϧΰϦζϜ͍͠㱺ίʔυ͕ཧ͞Ε͍ͳ͍ͱͬͱ͍͠
• ίʔυͷཧ͕ඞཁ 26 ·ͣNotebookΛεΫϦϓτʹͯ͠Ώ͘
εΫϦϓτԽ • ࣮ݧεςʔδͰ࡞ͨ͠ Jupyter Notebook ʹهड़͞Ε͍ͯΔॲཧ Λ Python εΫϦϓτʹҠߦ •
࡞ۀɿ • ࣮ݧϑϩʔͷߏԽɿͨॻ͖ͷॲཧ͔ΒؔɺΫϥεͷநग़ • ͋Θͤͯؤ݈ੑͷ֬อɿϦϑΝΫλϦϯάɺςετՃ 27
ϦϑΝΫλϦϯά • ϓϩάϥϜͷ֎෦͔Βݟͨಈ࡞ Λม͑ͣʹιʔείʔυͷ෦ ߏΛཧ͢ΔʢWikipedia Α Γʣ • ॴײɿGitHub
Qiita Ͱެ։͞ Ε͍ͯΔػցֶशϓϩδΣΫτ ཧ͞Ε͍ͯΔͷ͕গͳ͍ ʢଞͷίʔυͱൺֱʣɻ 28
ϦϑΝΫλϦϯά߲ ॳาతͳཧͰಡΈ্͕͢͢͞ΔʢCIɺCDͷੴʣ • ؔͷ͞ • มͷείʔϓ • ͕ؔऔΔҾͷ • ϚδοΫφϯόʔͷఆͷஔ͖͑
• ಉ͡ॲཧΛҰՕॴʹ·ͱΊΔ • ਂ͍ωετ෦Λؔͱͯ͠நग़͢Δ 29
ࣗಈςετ • ςετɿೖྗʹରͯ͠ظͨ͠Ξ τϓοτʹͳ͍ͬͯΔ͔Λݕূ ͢Δίʔυ • ࠷ݶɿલॲཧɺEnd-to-Endͷς ετॻ͘ 30
ςετͷԸܙ • ςετ=༷ • υΩϡϝϯτΛॻ͍ͯ࣌ؒͱͱʹᴥᴪ͕ੜ·ΕΔ • CIͰಈ࡞͢Δςετʹᴥᴪ͕ͳ͍ • ॻ͍͓͍ͯͯ͋͛ΔͱɺϓϩδΣΫτΛҾ͖ܧ͙ਓͷཧղΛॿ͚Δ •
ςετ͕ແ͍ػցֶशͷίʔυΛमਖ਼͢Δͷڪා 31
ͦͷଞͷίʔυཧεςʔδͰͷ࡞ۀ • ϫʔΫϑϩʔπʔϧͷಋೖ ʢmakeɺLuigiʣ • ϩΨʔՃ • υΩϡϝϯτڧԽ • ܧଓతΠϯςάϨʔγϣϯͷڥ
උ 32
Ξϯνύλʔϯɿίʔυཧεςʔδʹ͓ ͚Δۀ ୳ࡧతͳ࣮ݧ ίʔυཧ Ϟσϧͷ σϓϩΠ • ةݥɿϦαʔνϟ͕ݕূ࣮ͨ͠ݧ༰ΛΤϯδχΞ͕ཧͯ͠σϓϩΠ • ػցֶशͷίʔυ௨ৗͷϓϩάϥϜΑΓҾ͖ܧ͗ίετ͕େ͖͍
• ୭ϓϩδΣΫτΛཧղͰ͖ͳ͘ͳΔڪΕ͕͋Δ 33 33
ฐࣾࣄྫɿίʔυཧΛϖΞͰऔΓ Ή • ίʔυཧ࣌ʹϦαʔνϟɺΤϯδχΞͷϖΞ Λ࡞Δ • ίʔυͷݟΛڞ༗ͭͭ͠࡞ۀ • νʔϜϝϯόͷίʔσΟϯάೳྗΛۉҰԽ ୳ࡧతͳ࣮ݧ
ίʔυཧ Ϟσϧͷ σϓϩΠ 34
ίʔυཧͰͷ ̎ͭͷ͕͋Δ 1. ίʔυ͕ཧղͰ͖ͳ͍ 2. ϙʔλϏϦςΟ͕ແ͍ 35
εΫϦϓτΛ࣮ߦ͢Δڥ͕࡞Εͳ͍ • ػցֶशΛѻ͏εΫϦϓτଟͷϥΠϒϥϦʹґଘ • PythonҎ֎ͷݴޠͰهड़͞ΕͨπʔϧʹґଘʢMeCabͳͲʣ • ֤εςʔδʢ࣮ݧɺཧɺσϓϩΠʣ͝ͱʹҟͳΔڥʢܭࢉػʣ Ͱಈ࡞͢ΔͷͰ࣮ߦڥʹϙʔλϏϦςΟ͕ແ͍ͱਏ͍ɻɻɻ • ྫɿϩʔΧϧͰ͏·͘ಈ͍͍ͯͨεΫϦϓτ͕ຊ൪αʔόͰಈ
࡞͠ͳ͍ 36
ղܾํ๏ɿDocker Λಋೖ • ܰྔͳԾԽڥ • PythonϥΠϒϥϦҎ֎ͷɺґଘ͢ΔڥDockerfileʹهड़Ͱ͖Δ • ϓϩδΣΫτͷϙʔλϏϦςΟ্͕ 37
DockerͰڥΛԾԽ 38 • ཧɿ࣮ݧஈ֊͔Β DockerͰ࡞ۀ • εςʔδ͕มΘͬͯ ࣮֬ʹεΫϦϓτ͕ಈ ࡞͢Δڥ͕खʹೖΔ •
݁ՌɺϓϩδΣΫτͷ αΠΫϧ͕ճ͘͢͠
͔͠͠ɺɺDockerɺɺগʑࡶ ίϚϯυ͕͍ɻɻɻɻ(TдT) 39 ϓϩδΣΫτຖʹϙʔτϑΥϫʔυɺ Πϝʔδɺίϯςφ໊Λ֮͑Δඞཁ͋Δ(TдT)
ྫɿDocker ίϚϯυ • Docker Πϝʔδͷ࡞ • docker build -t ml-image
-f ./docker/Dockerfile . • Dockerίϯςφͷ࡞ • docker run -it -v `pwd`:/work -p 8888:8888 --name ml-image ml- container • ίϯςφͷআɺ࠶ੜ͢Δͨͼʹຖճಉ͡ίϚϯυΛଧͪࠐΉ… 40
ฐࣾࣄྫɿCookiecutter Docker Science • DockerڥͰͷ࣮ݧʙσϓϩΠ·ͰΛαϙʔτ͢ΔCookiecutterςϯ ϓϨʔτΛ࡞ • ΦʔϓϯιʔεϓϩδΣΫτʢͬͯΈͯΒ͑Δͱخ͍͠Ͱ͢ʣ • URL:
https://docker-science.github.io/ • ιʔεɿhttps://github.com/docker-science/cookiecutter-docker-science • Cookiecutter: ϓϩδΣΫτͷςϯϓϨʔτੜπʔϧ 41
ػೳɿCookicutter Docker Science • ΤϯδχΞϦϯάೳྗͷߴ͘ͳ͍ϝϯόͰDockerΛѻ͍͘͢ • DockerͷίϚϯυΛ make λʔήοτͰӅṭ •
Πϝʔδ໊ɺϙʔτɺϑΝΠϧϚϯτઃఆɺίϯςφ࡞Γ͠ etc … • ࣮ݧ͔ΒཧɺσϓϩΠ·ͰΛҙࣝͨ͠σΟϨΫτϦߏΛग़ྗ • σΟϨΫτϦߏͷڞ௨ԽʹΑΓϓϩδΣΫτͷݟ௨͠ • Cookiecutter Data Science ͷߏΛࢀߟʹͨ͠ 42
ϑΝΠϧɺσΟϨΫτϦߏͷ౷Ұ 43 make init Ͱ S3͔Βσʔ λΛμϯϩʔυ ֶशεΫϦϓτ͕ग़ྗ͢Δ ϞσϧΛอ࣋ ࣮ݧ༻ͷϊʔτϒο
ΫΛอ࣋ ίʔυཧ࣌ʹ࡞ ΒΕΔϝιουɺΫϥε Λอ࣋ ϓϩδΣΫτͷϫʔ ΫϑϩʔΛه
Cookiecutter Docker Science ͷ͍ํʢϓ ϩδΣΫτੜʣ $cookiecutter
[email protected]
:docker-science/cookiecutter-docker-science.git project_name [project_name]: image-classification
project_slug [image_classification]: jupyter_host_port [8888]: description [Please Input a short description]: Classify images into several categories data_source [Please Input data source in S3]: s3://research-data/food-images 44
σϞ: Cookiecutter Docker Science • ϓϩδΣΫτͷੜ • https://asciinema.org/a/ 6XV9dNixtzfUwWdoqLj7HG7A2 •
Docker image / container ίϯςφ࡞ • https://asciinema.org/a/ 06CcXPubAj3RSiMSTy3CZDrfG • Jupyter Notebook Λ্ཱͪ͛Δ 45
σϓϩΠεςʔδʹ͓͚Δ 46 • αʔόߏஙόονεΫϦϓτ Ͱػցֶशͷ݁ՌΛσϓϩΠ͢ Δඞཁ͕͋Δɻ • ػցֶशͷ݁ՌΛεϜʔζʹσ ϓϩΠ͢ΔʹɺΑ͍ج൫ʢΠ ϯϑϥʣ͕ඞཁ
ػցֶशͷ݁ՌΛσϓϩΠ͢Δίετ • ػցֶशνʔϜͷੜ࢈ੑ৫ͷΠϯϑϥٕ ज़ʹґଘ • ϓϩμΫγϣϯڥͰͷαʔόߏஙίετ ͕େ͖͍ͱɺػցֶशͷՌΛαʔϏεʹ өͰ͖ͳ͍ • جຊɿػցֶशνʔϜͷϝϯό͕ࣗ
GitHub GHE ʹϓϧϦΫΤετΛग़͢͜ͱ ͰɺϓϩμΫγϣϯڥʹαʔόΛߏஙͰ͖ Δ 47
ฐࣾࣄྫɿσϓϩΠͷޮԽ • αʔόཧ • ECSͷར༻ʢKubernates։࢝ʣ • Πϯϑϥ෦ʹΑͬͯඋ͞Ε͍ͯΔ • ػցֶशνʔϜࣗͰϓϩμΫγϣϯڥʹ αʔόΛߏஙͰ͖Δڥ͕ఏڙ͞Ε͍ͯΔ
• ඞཁͳ࡞ۀɿઃఆϑΝΠϧΛϨϙδτϦʹ Ճ͢Δ͚ͩʢهड़༰ɿೝূɺαʔόͷੑೳ etcʣ 48 • chie8842 ࢯʹΑͬͯػցֶशϓϩδΣ ΫτͷઃఆΛڞ௨Խ͕ਐΜͰ͍Δ • খ͞ͳϓϩδΣΫτͰ͋Εڞ௨ͷઃఆΛ ͏ • ೝূɺதؒσʔλஔ͖ etc • σϓϩΠϑϩʔͷ؆ૉԽ • هड़͕ඞཁͳઃఆ߲Λѹॖ
ฐࣾࣄྫɿKelner • @_lunardog_ ࢯ͕։ൃ͍ͯ͠ΔOSSϓϩδΣΫτ • രͰਂֶशͷϞσϧΛσϓϩΠ͢Δπʔϧ • URL: https://github.com/lunardog/kelner 49
·ͱΊɿػցֶशνʔϜʹ͓͚ΔΤϯ δχΞͷׂ • ػցֶशΛαʔϏεͰར༻͢ΔʹέΞ͢Δ෦͕ࢁ • ΞϧΰϦζϜɺσʔλͷऔಘɺίʔυͷ࣭ɺσϓϩΠ etc • ࣮ݧεςʔδͰݚڀೳྗ͕ཁٻ͞ΕɺίʔυཧҎ߱ͰΤϯ δχΞϦϯάೳྗ͕ཁٻ
• ݫີʹۀମ੍Λங͘ͷ͍ͨ͠Ίɺ֤ࣗͷೳྗͷ overlap ෦ Λ૿͢ͷ͕ॏཁʢϨϏϡʔɺϖΞϓϩͰݟڞ༗ͳͲʣ 50
ຊͷτϐοΫ • ػցֶशνʔϜʹ͓͚ΔιϑτΣΞΤϯδχΞͷׂ • Webاۀʹ͓͚ΔػցֶशͷΩϟϦΞܗ 51
νʔϜنʹΑͬͯҟͳΔΩϟϦΞ • ಉ͡MLνʔϜͰνʔϜنʹΑͬͯΩϟϦΞܗҟͳΔ • ͕ࣗͳΓ͍ͨΤϯδχΞ૾ʹϚον͢ΔنΛબ͢Δͷ͕Α͍ 52
খ͞ͳMLνʔϜͷϝϯό • ҰਓͰଟ༷ͳλεΫʹରॲ • ػցֶशϓϩδΣΫτͷαΠΫϧͯ͢ʹߩݙ͢Δ • σʔλऩूɺίʔυཧɺαʔόߏஙɺϞχλϦϯά • ϚϧνελοΫԽ •
ػցֶशࣗମͷΞϧΰϦζϜΛಥ͖٧ΊΔ࣌ؒ͋·ΓऔΕͳ͍ 53
େ͖ͳMLνʔϜͷϝϯό • ۀମ੍ͷඋ • ઐʹಛԽͨ͠ܦݧΛੵΊΔ • ྫɿ • Ϧαʔνϟɿ৽͍͠ΞϧΰϦζϜͷఏҊɺจ •
ΠϯϑϥΤϯδχΞɿΤϨΨϯτͳMLσϓϩΠϑϨʔϜϫʔΫͷ ಋೖɺ֦ு 54
ͨͩ͠ɺͲͪΒͷ߹Ͱ Ͳͷϝϯό࠷ݶ͍࣋ͬͯΔ͖ٕज़ελοΫ͋Δ • Git ͷͪΐͬͱৄ͍͍͠ํ • rebaseɺstashɺίϛοτ·ͱΊʢsquashʣɺbisect etc… • ԾԽɿDocker
• ίʔυͷ࣭ͷέΞ: ςετۦಈ։ൃ • IssueཧɿJiraɺRedmine ϝϯόશһ͕Ͱ͖ΔͱϓϩδΣΫτΛཧ͍͢͠ 55
ฐࣾࣄྫɿػցֶशνʔϜΛߏ͢Δ ϝϯό • த͙Β͍ͷنʢ߹ܭ໊̔ʣ • େ͖͚ͯ͘ೋछྨͷׂ͕͋Δ 1. ػցֶशΤϯδχΞɿػցֶशʴαʔϏεͷΠϯςάϨʔγϣϯ ɿϦαʔνϟدΓɺΤϯδχΞدΓͱ֤ࣗͷಘҙҧ͏ 2.
ΠϯϑϥΤϯδχΞɿػցֶशͷج൫Λ࡞ • ͜ͷଞɺνʔϜ֎ʹੳج൫ɺશࣾΠϯϑϥνʔϜ͕͋Γଟ͘ͷαϙʔτΛ Β͍ͬͯΔ 56
ฐࣾࣄྫɿػցֶशνʔϜϝϯόͷ ׂ • զʑͷنͰׂ֤͕ࣗʹϚον͢Δࣄ͚ͩΛ͢Δͱ৫͕ඇ ޮ • ͔ͳΓྲྀಈతͳׂ୲ɻओۀͷׂҎ֎ੵۃతʹ୲ • ྫɿΠϯϑϥΛओۀʹ͢Δϝϯό͕σʔλੳΛ୲ •
ҙਤɿ৫Ͱ֤ϝϯό͕ߩݙͰ͖Δ෯Λ૿͢ 57
ػցֶशʢR&Dʣʹ͓͚ΔΩϟϦΞܗͷ ͠͞ • ΠϯϑϥαʔϏε։ൃͱͷҧ͍ • ػցֶशࣗମແͯ͘αʔϏεΓཱͭ • ػցֶशαʔϏεΛΑΓྑ͘͢Δٕज़ • ձ͕ࣾظ͢ΔχʔζͱϚον͢ΔՌΛग़͢Α͏ҙࣝ͠ͳ͍ͱ͓ՙ
৫ʹͳΓ͍͢ → νʔϜղମ ʗ(^o^)ʘ • ڥʹΑΔधཁͷมಈ͕େ͖͘ɺҰ؏ͨ͠ΩϟϦΞΛங͖ʹ͍͘ɻɻɻ 58
ػցֶशνʔϜϝϯόͷੜଘઓུ • ෆ҆ఆͳ৫ͳͷͰɺ͋ΔఔੜଘઓུΛҙࣝͨ͠ํ͕Α͍ • ML/AIधཁ͕ʢҰ࣌తʹʣݮਰͯ͠ՁΛࣦΘͳ͍Α͏ʹඋ͑Δ • ํੑ̎ͭ͋ΔʢϚϧνελοΫԽɺҰಥഁʣ 59
ੜଘઓུ̍ɿؔ࿈ٕज़ͷशಘ • ػցֶशͷεϜʔζͳಋೖʹଟ༷ͳؔ࿈ٕज़͕ඞཁ • ػցֶशʹؔ࿈͢Δٕज़शಘͯ͠ϚϧνελοΫԽ • ػցֶशҎ֎ʹؔ࿈ͰڧΈΛ͍࣋ͬͯΔͱΑ͍ • ྫɿtakahi-i ݕࡧΤϯδϯʢSolrɺElasticsearchʣ͕ŧŔŕŪũƄŝſ
• ػցֶशҎ֎ͰձࣾʹߩݙͰ͖ΔΑ͏ʹ 60
ػցֶशͷपลͰར༻͞ΕΔٕज़ͷҰ෦ 61 ػցֶश
ετϨʔδʹؔ͢Δٕज़ ػցֶश • ֶशث͕ग़ྗ͢Δ݁ՌετϨʔδʹอ࣋ ͞Ε্ͨͰαʔϏεͰར༻͞ΕΔ • ݕࡧΤϯδϯɿSolrɺElasticsearch • σʔλϕʔεɿMySQLɺPostgreSQL •
·ͣεΩʔϚɺςʔϒϧઃܭΛͰ͖ΔΑ ͏ʹ 62
Πϯϑϥʹؔ͢Δٕज़ 63 ػցֶश • ػցֶशͷαʔϏεಋೖΛޮԽ͢Δͷʹར༻Ͱ ͖Δଟ༷ͳΠϯϑϥٕज़͕ଘࡏ͢Δ • ϫʔΫϑϩʔཧɿAirflow • ߏཧɿAnsibleɺChef
• αʔόཧɿDockerɺKubernatesɺECS • Πϯϑϥ͕ίʔυԽ͞ΕͨڥͩͱɺػցֶशΤ ϯδχΞࣗͰ͜ͷ͋ͨΓͷ࡞ۀ͕Ͱ͖Δ
αʔϏε։ൃͰར༻͢Δٕज़ 64 • σʔλϕʔεʹػցֶशͷΞϊςʔγϣϯ݁ՌΛอ࣋ͨ͠ޙɺ αʔϏεͰར༻͢ΔͨΊͷΠϯςάϨʔγϣϯ࡞ۀ͕ඞཁ • MVCϑϨʔϜϫʔΫʢRailsͳͲʣ • ·ͣϞσϧɺίϯτϩʔϥ࡞ΛࣗͰ࡞Δ •
ϑϩϯτͷௐɿJavaScriptɺCSS • ਐԽͷ͕ૣ͘ΩϟονΞοϓ͕େม͕ͩɺशಘ͢Δͱ αʔϏεʹಋೖ͘͢͠ͳΔ • ྫɿES̒ɺTypeScriptɺ ReactɺVueɺwebpack etc … ػցֶश
ੳج൫Ͱར༻͢Δٕज़ 65 • ॳาɿػցֶशͷσʔλΛੳج൫͔Βऔಘ͢ΔʢSQLʣ • தڃɿଟ༷ͳϏοΫσʔλϑϨʔϜϫʔΫΛ͏ • PythonɺSQL͚ͩͰͳ͘ɺࢄϑϨʔϜϫʔΫ ʢHadoopɺSparkʣ্ͷίʔυॻ͘ܦݧΛੵΉ •
ཧղΛਂΊΔͨΊʹझຯͰΞϧΰϦζϜΛ࣮ͯ͠ެ։ • ͕ࣗੲ࡞ͬͨͷʢLSH ͷұ࣮ʣɿ https:// github.com/takahi-i/likelike • ཧɿੳج൫ͷվળʹߩݙ ػցֶश
ҙɿֶɺཧֶͷम࢜ɺത͔࢜Βσʔ λαΠΤϯςΟετʹస͕ਐΜͰ͍Δɻ த్ͳཧղɺΞτϓοτͰੜ͖Δͷ ݫ͍͠ɻ ੜଘઓུ̎ɿػցֶशΛಥ͖٧ΊΔ • ͱͯ͠ͷ͕ऩ·ͬͯτοϓͷधཁ ৗʹଘࡏ͠ଓ͚Δ • ػցֶशʹ͓͍ͯɺҰྲྀͰ͋Δ͜ͱΛ
ࣔ͢ • ఆظతʢԿ͔ʹҰճʣʹ Top ΧϯϑΝϨ ϯεʹ࠾͞ΕΔ • NIPSɺICMLɺICCVɺCVPRɺCOLT etc… • Kaggle grand master 66
·ͱΊ • ػցֶशνʔϜʹ͓͚ΔΤϯδχΞͷׂʹ͍ͭͯղ આ • ϓϩδΣΫτͷ֤εςʔδʢ࣮ݧɺཧɺσϓϩΠʣ ͷΛղܾ͢Δ • ؔ࿈͢ΔνʔϜɿੳج൫ɺΠϯϑϥ͕ॏཁͳׂ •
ฐࣾͷࣄྫΛ͍͔ͭ͘հ • Webاۀʹ͓͚ΔػցֶशͷΩϟϦΞܗʹ͍ͭ ͯհ 67
ืूɿݚڀ։ൃ෦ͷΞϓϦέʔγϣ ϯΤϯδχΞ • ืूதͰ͢ • ݚڀͷՌͱΫοΫύουͷ࣮αʔϏεΛڮ͠Λ͢ΔϙδγϣϯͰ͢ • PoCͰͳ࣮͘αʔϏεɺΞϓϦέʔγϣϯͷ։ൃ • ɿػցֶशɺεϚʔτΩονϯ
• ৄͪ͘͜͠ΒΛ͓ಡΈ͍ͩ͘͞ ɿhttps:// cookpad.wd3.myworkdayjobs.com/en-US/jobs/job/Tokyo--Japan/--_R-001087-31 68
͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠ 69