Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to scrape web contents in Clojure
Search
ayato
January 09, 2016
Programming
97
2
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
How to scrape web contents in Clojure
ayato
January 09, 2016
More Decks by ayato
See All by ayato
マイクロサービス内で動くAPIをF#で書いている
ayato0211
1
1.5k
Clojureという言語が私逹にもたらしたもの
ayato0211
6
3.2k
3年間考え続けてきたWebアプリケーションにおけるテストの話
ayato0211
3
300
Re:REPL-Driven Development
ayato0211
3
1.4k
Meta Template Engine
ayato0211
2
1.2k
超変換! Hiccup data structure!!
ayato0211
2
660
About Integrant
ayato0211
0
600
Muscle Assert
ayato0211
0
320
Clojureを用いたWebアプリケーション開発
ayato0211
2
3.2k
Other Decks in Programming
See All in Programming
jQueryをバージョンアップする前に使いたいjQuery Migrate
matsuo_atsushi
0
470
決定論的オーケストレーションの設計と実装 / Design and Implementation of Deterministic Orchestration
nrslib
3
1.3k
依存関係から依存物へ―Dependencyという言葉の歴史をひも解く
j_lee
0
120
IBM Bobを活用したレガシーアプリの最新化
oniak3ibm
PRO
1
190
そのテスト、説明できますか?~LWテスト戦略FW~のご紹介
nakahara
0
110
並列実装の現場、2ヶ月間実務でAIを使い倒したAIもPCも私も限界が近い
ming_ayami
0
130
Java × distroless で 軽量なコンテナイメージを / Java on Distroless
contour_gara
0
540
Vite+ Unified Toolchain for the Web
naokihaba
0
300
代数的データ型って何が嬉しいの? #frontend_phpcon_do
kajitack
8
3.7k
その問い、本当に正しいですか?AI時代のエンジニアに必要な哲学と認知科学 / ai-philosophy-cognitive-science
minodriven
7
4.4k
Developing with AI Agents — Codex, Claude Code & Cowork Practical Guide
x5gtrn
PRO
0
1.3k
Skillsは効率化、Agentsは"自分の拡張"——Builder時代のエージェント編成(CC Night 2026)
wemra
1
130
Featured
See All Featured
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
GraphQLとの向き合い方2022年版
quramy
50
15k
Color Theory Basics | Prateek | Gurzu
gurzu
0
360
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
310
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
1
540
Lessons Learnt from Crawling 1000+ Websites
charlesmeaden
PRO
1
1.3k
Making Projects Easy
brettharned
120
6.7k
ラッコキーワード サービス紹介資料
rakko
1
3.6M
Designing Experiences People Love
moore
143
24k
The browser strikes back
jonoalderson
0
1.2k
Information Architects: The Missing Link in Design Systems
soysaucechin
0
970
Git: the NoSQL Database
bkeepers
PRO
432
67k
Transcript
)PXUPTDSBQF XFCDPOUFOUT JO$MPKVSF !@BZBUP@Q
͋ͽʔ $MPKVSJBO $ZCP[V4UBSUVQT *OD
8IBUJTXFCTDSBQJOH ΣϒεΫϨΠϐϯά 8FCTDSBQJOH ͱɺ ΣϒαΠτ͔ΒใΛநग़͢Δ ίϯϐϡʔλιϑτΣΞٕज़ͷ͜ͱɻ CZXJLJQFEJB
1SPCMFNT 8FCίϯςϯπߏʹ͍ۙܗΛ͍ͯ͠Δ ࣅ͍ͯΔϖʔδ͕ࢁ͋Δ͕ඍົʹҧ͏ ߏΛ୧Δ࠶ؼతͳίʔυΛॻ͘ඞཁ͕͋Δ ͍͍ͩͨ໘͍͘͞
4LZTDSBQFS ߏΛ࠶ؼతʹ୧ͬͯ͘ΕΔ ϖʔδͷλΠϓຖʹॲཧํ๏͚ͩॻ͚͍͍ ԆγʔέϯεΛฦͯ͘͠ΕΔ Ωϟογϡػߏ͕͍͍ͭͯΔ εΫϨΠϐϯά෦&OMJWFґଘ IUUQTHJUIVCDPNOBUIFMMTLZTDSBQFS
(defn seed [username from until] (let [url (str "http://twilog.org/" username)]
[{:username username :from from :until until :url url :processor ::user-page}])) (s/defprocessor user-page :cache-template "twilog/:username" :process-fn (fn [res {:keys [username]}] (let [not-registered (seq (html/select res [:div.box-info.box-icon])) not-found (seq (html/select res [:div.box-attention.box-icon]))] (cond not-registered [{:msg "This account was not registered."}] not-found [{:msg "This account was not found."}] :else [{:url (str "http://twilog.org/" username "/archives") :processor ::archives-page}])))) &YBNQMF
(defn scrape [username & [{:as options :keys [html-cache processed-cache from
until] :or {html-cache true processed-cache true from "00000000" until "99999999"}}]] (let [handler (create-handler identity options)] (handler (s/scrape (seed username from until) :html-cache html-cache :processed-cache processed-cache)))) &YBNQMF
$PODMVTJPO 4LZTDSBQFSΛ͏ͱͤʹͳΕΔ $MPKVSF࠷ߴʂ
Enjoy Clojure