Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
俺が最初にヘッドレスChromeでクローラ作った 事になんねーかな
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
yujiosaka
February 22, 2018
1.4k
4
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
俺が最初にヘッドレスChromeでクローラ作った 事になんねーかな
yujiosaka
February 22, 2018
More Decks by yujiosaka
See All by yujiosaka
I was understanding WASM all wrong! 🤯
yujiosaka
2
330
Machine Learning with JavaScript
yujiosaka
0
230
JavaScriptでも機械学習がやりたかった話
yujiosaka
2
490
ヘッドレスChromeでクローラを作った後の話
yujiosaka
3
740
『XXX』のための管理画面
yujiosaka
1
1.4k
Enjoy Deep Learning by JavaScript
yujiosaka
1
400
ひたすら楽してディープラーニング
yujiosaka
20
13k
technology x business
yujiosaka
3
610
第二回もんご祭 パネルディスカッション
yujiosaka
0
920
Featured
See All Featured
Done Done
chrislema
186
16k
How to build a perfect <img>
jonoalderson
1
5.6k
Leveraging Curiosity to Care for An Aging Population
cassininazir
1
270
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
130
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
400
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
310
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.8k
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
WENDY [Excerpt]
tessaabrams
11
38k
SEOcharity - Dark patterns in SEO and UX: How to avoid them and build a more ethical web
sarafernandez
0
200
Accessibility Awareness
sabderemane
1
140
Everyday Curiosity
cassininazir
0
230
Transcript
Yuji Isobe Զ͕࠷ॳʹϔουϨε ChromeͰΫϩʔϥ࡞ͬͨ ࣄʹͳΜͶʔ͔ͳ NodeֶԂ29࣌ݶ
min e ϓϩδΣΫτϚωʔδϟʔ at @yujiosaka https://speakerdeck.com/yujiosaka/hitasurale-sitedeipuraningu
✓ Կނ͍·͞ΒΫϩʔϥͳͷ͔ ✓ ԿΛࢦͯ͠࡞͔ͬͨ ✓ ԿΛߟ͑ͳ͕Β࡞͔ͬͨ ✓ ͜Ε͔ΒͷΫϩʔϥ ࠓճΫϩʔϥΛ࡞ͬͨ
ڈ৭Μͳ͜ͱΛͬͨ…
ECZine࿈ࡌ http://eczine.jp/article/detail/4869
ECઐՈσϏϡʔ http://amzn.asia/aOkwFjH
ࠔͬͨ(´ɾωɾʆ)
ձࣾͰΤϯδχΞͩͱ ࢥΘΕͳ͘ͳ͖ͬͯͨorz
ݸࣾຖʹνϡʔχϯάΛߦ͏ Ӧۀಉߦʹग़͔͚Δ ৽نϓϩμΫτͷఏҊ ӦۀࢿྉΛॻ͖࢝ΊΔ ϓϨεϦϦʔεΛॻ͖࢝ΊΔ ͍͚͑ͯͳ͍Ұઢ ←AIΤϯδχΞͰ͢͠ ← ٕज़Ӧۀ͔ͳ ←
BizDevͩΑͶ ← ͓ɺ͓͏… ←͍͋ͭ͏ ɹΤϯδχΞ͡ΌͶʔΘ
Ͱ͖ΕΤϯδχΞͱͯ͠ Ұੜ൧Λ৯͍͖͍ͬͯͨ
ձࣾͰΤϯδχΞͱͯ͠ͷ ଚݫΛ࠶ͼऔΓ͢
ͦΜͳ͋Δ࣌…
ϔουϨεChromeΛΔ https://developers.google.com/web/updates/2017/04/headless-chrome?hl=ja
✓ Chrome͕ϔουϨεϞʔυͰىಈͰ͖Δ ✓ ChromeͷىಈΦϓγϣϯʹʮ--headessʯΛՃ͑Δ͚ͩ ✓ දతͳϔουϨεϒϥβͱ͍͑PhantomJS ✓ ߴͰ҆ఆͯ͠ಈ࡞͢Δ ✓ ඪ४ͷରԠ͕ૣ͍ʢES2017Async-Await͕͑Δʣ
✓ ओͳ༻్ςετࣗಈԽͱಈతΫϩʔϥ ϔουϨεChromeͱ
✓ ੩తΫϩʔϥʢwgetcurlʣ ✓ υΩϡϝϯτʢHTMLϑΝΠϧʣͷϦΫΤετͷΈ ✓ ϑΝΠϧΛύʔε͢Δ͚ͩͳͷͰߴʹಈ࡞͢Δ ✓ AngularJSɺReactɺVue.jsͰ࡞ΒΕͨSPAαΠτͰಈ࡞͠ͳ͍ ✓ ಈతΫϩʔϥʢPhantomJSϔουϨεChromeʣ
✓ ը૾JavaScript͓ΑͼCSSΛಡΈࠐΜͰඳը·Ͱߦ͏ ✓ JavaScriptͷ࣮ߦ·Ͱߦ͏ͷͰҰൠతʹ ✓ SPAαΠτͰैདྷͷαΠτͱಉ͡Α͏ʹಈ࡞͢Δ ੩తΫϩʔϥ vs. ಈతΫϩʔϥ ※ উखͳ໋໊Ͱ͢
Chrome DevTools Protocol https://chromedevtools.github.io/devtools-protocol/ ✓ ࠷৽ͷ༷ Chromiumίʔυ্ͷ JSONϑΝΠϧ ✓ 1࣌ؒʹ1ճGitHubͷ
ϨϙδτϦʹίϐʔ ͞Ε͍ͯΔ
ϕϯνϚʔΫ https://hackernoon.com/benchmark-headless-chrome-vs-phantomjs-e7f44c6956c
RIP PhantomJS https://groups.google.com/forum/#!topic/phantomjs/9aI5d-LDuNE
͜Ε͔Β࢝ΊΔͳΒ ϔουϨεChrome
✓ API͕Ϩϕϧա͗ͯѻ͍͕͍͠ ✓ ༷͕·ͩෆ҆ఆͰ͍͔͚Δͷ͕େม ✓ ηΩϡϦςΟͷϒϩοΫʹҾ͔͔ͬΔ ✓ Content Security PolicyͳͲɺϢʔβʔͷอޢ͕࡞ಈͯ͠͠·͏
✓ ΧδϡΞϧʹόάΛ౿Ή ✓ setRequestInterceptionͷ࣮͕·࣮ͩݧஈ֊ ͔͠͠ࢁੵΈ
✓ Google ChromeνʔϜ͕ ϝϯςφϯε ✓ ߴϨϕϧͷAPIͰϔουϨε Chrome͕ѻ͑Δϥούʔ ✓ 1݄ʹv1.0.0͕ϦϦʔε͞Εͨ ✓
Slackάϧʔϓ࡞ΒΕ ରԠஸೡͰૣ͍ GoogleChrome / puppeteer https://github.com/GoogleChrome/puppeteer
None
None
ϔουϨε ChromeͰ Ϋϩʔϥ
ͬͯͭ ϝονϟ ྲྀߦͬͯΔ ʙʙʙ
Զ͕ ࠷ॳʹ ࡞ͬͨ ͜ͱʹ ͳΜͶ ʔ͔ͳ
ؾ͍ͮͨ
puppeteer / examples https://github.com/GoogleChrome/puppeteer/tree/master/examples
ʮͬͯΈͨʯͱʮղઆʯ ͔ΓͰ࣮༻తͳͷগͳ͍
ϔουϨεChromeͰ࠷ॳͷ ࣮༻తͳΫϩʔϥΛ࡞Ζ͏
✓ طଘͷΫϩʔϥ͕PromiseʹରԠ͍ͯ͠ͳ͍ ✓ ࢄڥͰಈ࡞͢ΔNode.jsͷΫϩʔϥ͕ͳ͔ͬͨ ͦͷଞͷཧ༝
✓ ࣮༻తͳΫϩʔϥͱͯ͠ඞཁͳػೳΛຬ͍ͨͯ͠Δ ✓ υΩϡϝϯτ͕ӳޠͰॻ͔Ε͍ͯΔ ✓ ςετ͕ेΧόʔ͞Ε͍ͯΔ ✓ ࢄڥͰಈ࡞͢Δ ✓ APIγϯϓϧʹอͭ
✓ puppeteer / examples ʹࡌͤͯΒ͏ ΰʔϧΛܾΊΔ
͜ΕͰΤϯδχΞͱͯ͠ͷ ଚݫΛऔΓ͢
…
Ͱ͖ͨ https://github.com/yujiosaka/headless-chrome-crawler
ΰʔϧୡ https://github.com/GoogleChrome/puppeteer/tree/master/examples
Google Developersʹసࡌ https://developers.google.com/web/tools/puppeteer/examples
ΞΫηε͕૿͑ͯϏϏΔ
)$$SBXMFSMBVODI \ NBY%FQUI ୳ࡧ͢Δ࠷େͷਂ͞ NBY$PODVSSFODZ ࠷େฒྻ BMMPXFE%PNBJOT<bXXXFNJODPKQ> ڐՄ͞Ε͍ͯΔυϝΠϯ FWBMVBUF1BHF
bUJUMF UFYU ϖʔδ্ͰධՁ͞ΕΔؔ PO4VDDFTT SFTVMU\ޭ࣌ʹධՁ͞ΕΔؔ DPOTPMFMPH A\SFTVMUPQUJPOTVSM^aU\SFTVMUSFTVMU^A ^ ^ UIFO BTZODDSBXMFS\ DSBXMFSRVFVF IUUQTXXXFNJODPKQ BXBJUDSBXMFSPO*EMF BXBJUDSBXMFSDMPTF ^ σϞ
Ϋϩʔϥ͕Ͱ͖Δ·Ͱ
✓ ʮΫϩʔϦϯάʯͱʮεΫϨΠϐϯάʯҧ͏ ✓ ΫϩʔϦϯάɿHTML͔ΒϦϯΫΛݟ͚ͭΔ ✓ εΫϨΠϐϯάɿHTML͔Βཉ͍͠ใΛݟ͚ͭΔ ✓ ͦΕͧΕ୯ମͰଘࡏͯ͠ҙຯ͕ͳ͍ ࠷ϛχϚϧͳΫϩʔϥ
ೋͭͷڞ௨Կ͔
HTML͔ΒɹɹɹΛݟ͚ͭΔ
ͦΕͬͯjQueryͰΑ͘Ͷʁ
jQuery: true, ϖʔδʹK2VFSZΛࣗಈૠೖ v1.0.0ϦϦʔε
)$$SBXMFSMBVODI \ K2VFSZUSVF FWBMVBUF1BHF bUJUMF UFYU PO4VDDFTT
SFTVMU\ DPOTPMFMPH A\SFTVMUPQUJPOTVSM^aU\SFTVMUSFTVMU^A ^ ^ UIFO BTZODDSBXMFS\ DSBXMFSRVFVF IUUQTXXXFNJODPKQ BXBJUDSBXMFSPO*EMF BXBJUDSBXMFSDMPTF ^ example
✓ ੩తΫϩʔϥʹ׳Ε͍ͯΔͱɺ͛͢ʔ͘ײ͡Δ ✓ ͻͬͦΓΤϥʔͰࢭ·ͬͯͨΓ͢ΔͱϚδͰԜΉ ΠϥΠϥ͠ͳ͍Ϋϩʔϥ
✓ λεΫΩϡʔͱΩϟογϡʹRedisΛ༻͍Δ ✓ ෳͷαʔόͰRedisΛڞ༗ ࢄڥͰಈ࡞ͤ͞Δ
cache: new RedisCache(), ΩϟογϡετϨʔδʹ3FEJTΛࢦఆ v1.3.0ϦϦʔε
)$$SBXMFSMBVODI \ DBDIFOFX3FEJT$BDIF \IPTU QPSU^ FWBMVBUF1BHF bUJUMF UFYU
PO4VDDFTT SFTVMU\ DPOTPMFMPH A\SFTVMUPQUJPOTVSM^aU\SFTVMUSFTVMU^A ^ ^ UIFO BTZODDSBXMFS\ DSBXMFSRVFVF IUUQTXXXBNB[PODPKQ BXBJUDSBXMFSPO*EMF BXBJUDSBXMFSDMPTF ^ example )$$SBXMFSMBVODI \ DBDIFOFX3FEJT$BDIF \IPTU QPSU^ FWBMVBUF1BHF bUJUMF UFYU PO4VDDFTT SFTVMU\ DPOTPMFMPH A\SFTVMUPQUJPOTVSM^aU\SFTVMUSFTVMU^A ^ ^ UIFO BTZODDSBXMFS\ DSBXMFSRVFVF IUUQTXXXBNB[PODPKQ BXBJUDSBXMFSPO*EMF BXBJUDSBXMFSDMPTF ^ )$$SBXMFSMBVODI \ DBDIFOFX3FEJT$BDIF \IPTU QPSU^ FWBMVBUF1BHF bUJUMF UFYU PO4VDDFTT SFTVMU\ DPOTPMFMPH A\SFTVMUPQUJPOTVSM^aU\SFTVMUSFTVMU^A ^ ^ UIFO BTZODDSBXMFS\ DSBXMFSRVFVF IUUQTXXXBNB[PODPKQ BXBJUDSBXMFSPO*EMF BXBJUDSBXMFSDMPTF ^
✓ ෯༏ઌ୳ࡧʢBFSʣˍਂ͞༏ઌ୳ࡧʢDFSʣ ✓ robots.txtʹै͏ ✓ XMLαΠτϚοϓ୳ࡧ ✓ σόΠεͷΤϛϡϨʔγϣϯ ✓ ϖʔδͷεΫϦʔϯγϣοτ
✓ JSON/CSVग़ྗ ͦͷଞͷػೳ
͜Ε͔ΒͷΫϩʔϥ
✓ ͜ͷΫϩʔϥͷͨΊʹαʔόʔ100ฒͯ ΫϩʔϦϯά͢ΔౕͳΜ͍ͯͳ͍͠ΊΜͲ͍͘͞ ✓ ίϚϯυҰൃͰࢄڥʹσϓϩΠͯ͠ཉ͍͠ ݱࡏͷ՝
None
✓ ߏཧʰπʔϧʱʹ͍ۙ ✓ AWS LambdaɺAzure Functionsɺ Google CloudFunctionsΛ༰қʹσϓϩΠɾ࣮ߦ ✓ Node.js,
Python, Java, Scala, C#, F#, Go, Groovy, Kotlin, PHP & SwiftΛαϙʔτ ✓ ศརͳϓϥάΠϯͨ͘͞Μ Serverless Frameworkͱ
yarn (npm run) deploy yarn (npm run) start v2.0.0 will
be… "84-BNCEBʹσϓϩΠ ฒྻͰΫϩʔϦϯά։࢝
Զ͕࠷ॳʹϔουϨε ChromeͰΫϩʔϥ ࡞ͬͨࣄʹͳΜͶʔ͔ͳ
Զ͕࠷ॳʹϔουϨε ChromeͰ࣮༻తͳΫϩʔϥ ࡞ͬͨࣄʹͳΜͶʔ͔ͳ
͚ͩͲຊɺࣄͰ ͬͱίʔυΛॻ͖͍ͨ
WE ARE HIRING https://www.emin.co.jp/blog/news/1527/ ηʔϧε