Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Introduction for Brownant
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Jiangge Zhang
April 11, 2014
Programming
180
4
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Introduction for Brownant
Jiangge Zhang
April 11, 2014
Other Decks in Programming
See All in Programming
ADKを使って簡単にAIエージェントを作ってみよう
k1mu21
0
260
例外の正しい扱い方 そのエラー try-catchして大丈夫?
jinwatanabe
0
230
CSC307 Lecture 17
javiergs
PRO
0
320
Even G2とAWSで推しのエージェントを召喚しよう!
har1101
1
110
DynamoDBには集計系のクエリがないけどなんとかしたい
musan
1
140
Claspは野良GASの夢をみるか
takter00
0
190
Java × distroless で 軽量なコンテナイメージを / Java on Distroless
contour_gara
0
540
Spec Driven Development | AI Summit Lisbon
danielsogl
PRO
0
190
New "Type" system on PicoRuby
pocke
1
920
「なぜそう決めたのか」を残し続ける仕組み ― Notion AI カスタムエージェント × Slack連携による設計判断の自動記録 - NIKKEI Tech Talk #47
niftycorp
PRO
0
170
Contextとはなにか
chiroruxx
1
320
Lessons from Spec-Driven Development
simas
PRO
0
190
Featured
See All Featured
AI in Enterprises - Java and Open Source to the Rescue
ivargrimstad
0
1.3k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
250
1.3M
How Software Deployment tools have changed in the past 20 years
geshan
0
34k
A designer walks into a library…
pauljervisheath
211
24k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
37
6.5k
Marketing Yourself as an Engineer | Alaka | Gurzu
gurzu
0
230
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
410
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
62k
Are puppies a ranking factor?
jonoalderson
1
3.5k
Code Reviewing Like a Champion
maltzj
528
40k
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
180
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
430
Transcript
(´ŋ_ŋ`)
贴⼀一个 URL,得到数据 “抓”
需要结构化的数据 • ⽤用统计学⽅方法猜测 —— Readability、Pocket、搜索引擎 • ⽤用约定的协议(如 schema.org、OpenGraph) —— Facebook、Twitter
Card、搜索引擎 • 抓取者定制规则 —— ⾖豆瓣东⻄西
⽐比较笨的实现 写⼀一个函数去匹配和分解 URL
⽐比较笨的实现 ⼜又写⼀一个函数去抽取信息
其实⼤大部分规则 都是可以 ⽤用配置⽂文件写出来的
配置⽂文件就是这样 。。。。。。
静态配置的问题 • 原来 title_pattern = `//*[@id=info]/h2/text()` • 后来这个⺴⽹网站改版了,需要请求另⼀一个 API 才能拿
到标题 • 我们就 。。。。。。
如果需要能伸能缩 DSL
最好的例⼦子 DSL with Ruby
既是语⾔言,也是配置
Brownant 基于 Python 的 descriptor 特性
Python 的 descriptor • 拦截 getattr、setattr、delattr • 能访问宿主对象 • “元属性”
None
Pipeline o.title o.etree o.text_response o.http_client o.url
Pipeline o.title o.etree o.text_response o.http_client o.ajax_response o.ajax_json o.price o.url o.ajax_url
借助了其他开源库 • Werkzeug —— URL 分发 • lxml 和 requests
—— 访问⺴⽹网络、解析 HTML • six —— 兼容 Python 2 / Python 3
接下来希望解决的问题 • ⽂文档太简陋 • 内置 PipelineProperty 类型太少 • 不会画蚂蚁,所以没 Logo
github.com:douban/brownant Waiting your pull request tonight~❤️