Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Scrapy Overview
Search
JusBrasil
April 12, 2013
Programming
2
190
Scrapy Overview
An overview of the Scrapy framework by @cacovsky
JusBrasil
April 12, 2013
Tweet
Share
Other Decks in Programming
See All in Programming
AIエンジニアリングのご紹介 / Introduction to AI Engineering
rkaga
8
3.2k
バックエンドエンジニアによる Amebaブログ K8s 基盤への CronJobの導入・運用経験
sunabig
0
170
生成AIを利用するだけでなく、投資できる組織へ
pospome
2
380
モデル駆動設計をやってみようワークショップ開催報告(Modeling Forum2025) / model driven design workshop report
haru860
0
280
AIコーディングエージェント(Gemini)
kondai24
0
260
Microservices rules: What good looks like
cer
PRO
0
1.6k
Denoのセキュリティに関する仕組みの紹介 (toranoana.deno #23)
uki00a
0
130
Kotlin Multiplatform Meetup - Compose Multiplatform 외부 의존성 아키텍처 설계부터 운영까지
wisemuji
0
110
The Art of Re-Architecture - Droidcon India 2025
siddroid
0
120
Combinatorial Interview Problems with Backtracking Solutions - From Imperative Procedural Programming to Declarative Functional Programming - Part 2
philipschwarz
PRO
0
100
AtCoder Conference 2025「LLM時代のAHC」
imjk
2
560
Python札幌 LT資料
t3tra
6
1k
Featured
See All Featured
GraphQLとの向き合い方2022年版
quramy
50
14k
The B2B funnel & how to create a winning content strategy
katarinadahlin
PRO
0
180
Leveraging Curiosity to Care for An Aging Population
cassininazir
1
130
Measuring & Analyzing Core Web Vitals
bluesmoon
9
710
Rebuilding a faster, lazier Slack
samanthasiow
85
9.3k
AI in Enterprises - Java and Open Source to the Rescue
ivargrimstad
0
1k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
34k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
Leading Effective Engineering Teams in the AI Era
addyosmani
9
1.4k
Building a Scalable Design System with Sketch
lauravandoore
463
34k
Optimizing for Happiness
mojombo
379
70k
Building an army of robots
kneath
306
46k
Transcript
Scrapy an overview
/skræpi/
Web Crawler vs. Web Scraper
None
None
Scrapy Framework Scraping / Crawling / Monitoring / Testing
Stable Active Large community
~200 pages of docs
Commercial support
Framework?
None
None
None
Twisted event loop (reactor)
None
Your code goes here
The scraping logic
None
HttpErrorMiddleware UrlLengthMiddleware DepthMiddleware
HttpProxyMiddleware HttpCacheMiddleware RedirectMiddleware
Media download Persistence Post-processing
Data flow control
Queuing
Talk is cheap, show me the code.
$ pip install Scrapy $ scrapy startproject home_news
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Project root
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Project config
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Project module
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your items
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your pipelines
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your settings
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your spiders...
None
//*[@id="glbcorpo"]/div/div[1]/div[1]/div[2]/div[1]/div[1]/div/div/a/@href
//*[@id="glbmateria"]/div[2]/h1/text()
//*[@id="materialetra"]/div/div/p[1]/text()
None
$ pwd /home/caco/studies/scrapy_news/home_news
$ pwd /home/caco/studies/scrapy_news/home_news (project root)
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json (feed exporters: json,csv,xml)
None
None
None
Other nice features • scrapyd: run as a service •
Webservice (issue commands via http requests) • Signals • Stats module • Contribs (CrawlSpider etc)
Obrigado! @cacovsky Thanks! @cacovsky
Images Spatula http://www.duebuoi.it/x/uk_usd/catalog/p/spatulas~805-16x10.html Spiderman http://tincan21.deviantart.com/art/muro-spidey-307810412