Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Introduction for Brownant
Search
Jiangge Zhang
April 11, 2014
Programming
4
150
Introduction for Brownant
Jiangge Zhang
April 11, 2014
Tweet
Share
Other Decks in Programming
See All in Programming
登壇は dynamic! な営みである / speech is dynamic
da1chi
0
360
釣り地図SNSにおける有料機能の実装
nokonoko1203
0
200
What's new in Spring Modulith?
olivergierke
1
170
Six and a half ridiculous things to do with Quarkus
hollycummins
0
220
Android16 Migration Stories ~Building a Pattern for Android OS upgrades~
reoandroider
0
140
Blazing Fast UI Development with Compose Hot Reload (Bangladesh KUG, October 2025)
zsmb
1
210
Cursorハンズオン実践!
eltociear
2
1.2k
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
300
alien-signals と自作 OSS で実現する フレームワーク非依存な ロジック共通化の探求 / Exploring Framework-Agnostic Logic Sharing with alien-signals and Custom OSS
aoseyuu
2
730
Migration to Signals, Resource API, and NgRx Signal Store
manfredsteyer
PRO
0
120
EMこそClaude Codeでコード調査しよう
shibayu36
0
450
あなたとKaigi on Rails / Kaigi on Rails + You
shimoju
0
190
Featured
See All Featured
Code Reviewing Like a Champion
maltzj
526
40k
A better future with KSS
kneath
239
18k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1k
Building Applications with DynamoDB
mza
96
6.7k
Rebuilding a faster, lazier Slack
samanthasiow
84
9.2k
Being A Developer After 40
akosma
91
590k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
9.7k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.7k
[RailsConf 2023] Rails as a piece of cake
palkan
57
5.9k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Transcript
(´ŋ_ŋ`)
贴⼀一个 URL,得到数据 “抓”
需要结构化的数据 • ⽤用统计学⽅方法猜测 —— Readability、Pocket、搜索引擎 • ⽤用约定的协议(如 schema.org、OpenGraph) —— Facebook、Twitter
Card、搜索引擎 • 抓取者定制规则 —— ⾖豆瓣东⻄西
⽐比较笨的实现 写⼀一个函数去匹配和分解 URL
⽐比较笨的实现 ⼜又写⼀一个函数去抽取信息
其实⼤大部分规则 都是可以 ⽤用配置⽂文件写出来的
配置⽂文件就是这样 。。。。。。
静态配置的问题 • 原来 title_pattern = `//*[@id=info]/h2/text()` • 后来这个⺴⽹网站改版了,需要请求另⼀一个 API 才能拿
到标题 • 我们就 。。。。。。
如果需要能伸能缩 DSL
最好的例⼦子 DSL with Ruby
既是语⾔言,也是配置
Brownant 基于 Python 的 descriptor 特性
Python 的 descriptor • 拦截 getattr、setattr、delattr • 能访问宿主对象 • “元属性”
None
Pipeline o.title o.etree o.text_response o.http_client o.url
Pipeline o.title o.etree o.text_response o.http_client o.ajax_response o.ajax_json o.price o.url o.ajax_url
借助了其他开源库 • Werkzeug —— URL 分发 • lxml 和 requests
—— 访问⺴⽹网络、解析 HTML • six —— 兼容 Python 2 / Python 3
接下来希望解决的问题 • ⽂文档太简陋 • 内置 PipelineProperty 类型太少 • 不会画蚂蚁,所以没 Logo
github.com:douban/brownant Waiting your pull request tonight~❤️