Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Scrapy Overview
Search
JusBrasil
April 12, 2013
Programming
2
190
Scrapy Overview
An overview of the Scrapy framework by @cacovsky
JusBrasil
April 12, 2013
Tweet
Share
Other Decks in Programming
See All in Programming
TypeScriptのmoduleオプションを改めて整理する
bicstone
4
420
漸進。
ssssota
0
1.1k
複雑なフォームを継続的に開発していくための技術選定・設計・実装 #tskaigi / #tskaigi2025
izumin5210
12
6.3k
❄️ tmux-nixの実装を通して学ぶNixOSモジュール
momeemt
1
120
抽象データ型について学んだ
ryounasso
0
210
コンポーネントライブラリで実現する、アクセシビリティの正しい実装パターン
schktjm
1
660
Rethinking Data Access: The New httpResource in Angular
manfredsteyer
PRO
0
220
TypeScript Language Service Plugin で CSS Modules の開発体験を改善する
mizdra
PRO
3
2.4k
What Spring Developers Should Know About Jakarta EE
ivargrimstad
1
600
Devinで実践する!AIエージェントと協働する開発組織の作り方
masahiro_nishimi
6
2.5k
Language Server と喋ろう – TSKaigi 2025
pizzacat83
2
670
OpenNext + Hono on Cloudflare でイマドキWeb開発スタックを実現する
rokuosan
0
110
Featured
See All Featured
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
48
5.4k
Fireside Chat
paigeccino
37
3.5k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
12k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
45
9.6k
Optimising Largest Contentful Paint
csswizardry
37
3.3k
Six Lessons from altMBA
skipperchong
28
3.8k
Making the Leap to Tech Lead
cromwellryan
133
9.3k
It's Worth the Effort
3n
184
28k
Code Review Best Practice
trishagee
68
18k
Documentation Writing (for coders)
carmenintech
71
4.8k
Building Better People: How to give real-time feedback that sticks.
wjessup
368
19k
Docker and Python
trallard
44
3.4k
Transcript
Scrapy an overview
/skræpi/
Web Crawler vs. Web Scraper
None
None
Scrapy Framework Scraping / Crawling / Monitoring / Testing
Stable Active Large community
~200 pages of docs
Commercial support
Framework?
None
None
None
Twisted event loop (reactor)
None
Your code goes here
The scraping logic
None
HttpErrorMiddleware UrlLengthMiddleware DepthMiddleware
HttpProxyMiddleware HttpCacheMiddleware RedirectMiddleware
Media download Persistence Post-processing
Data flow control
Queuing
Talk is cheap, show me the code.
$ pip install Scrapy $ scrapy startproject home_news
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Project root
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Project config
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Project module
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your items
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your pipelines
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your settings
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your spiders...
None
//*[@id="glbcorpo"]/div/div[1]/div[1]/div[2]/div[1]/div[1]/div/div/a/@href
//*[@id="glbmateria"]/div[2]/h1/text()
//*[@id="materialetra"]/div/div/p[1]/text()
None
$ pwd /home/caco/studies/scrapy_news/home_news
$ pwd /home/caco/studies/scrapy_news/home_news (project root)
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json (feed exporters: json,csv,xml)
None
None
None
Other nice features • scrapyd: run as a service •
Webservice (issue commands via http requests) • Signals • Stats module • Contribs (CrawlSpider etc)
Obrigado! @cacovsky Thanks! @cacovsky
Images Spatula http://www.duebuoi.it/x/uk_usd/catalog/p/spatulas~805-16x10.html Spiderman http://tincan21.deviantart.com/art/muro-spidey-307810412