$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Importing Wikipedia in Plone
Search
Makina Corpus
October 02, 2013
Technology
1
86
Importing Wikipedia in Plone
Eric BREHAULT – Plone Conference 2013
Makina Corpus
October 02, 2013
Tweet
Share
More Decks by Makina Corpus
See All by Makina Corpus
Publier vos données sur le Web - Forum TIC de l'ATEN 2014
makinacorpus
0
780
Créez votre propre fond de plan à partir de données OSM en utilisant TileMill
makinacorpus
0
140
Team up Django and Web mapping - DjangoCon Europe 2014
makinacorpus
3
880
Petit déjeuner "Les bases de la cartographie sur le Web"
makinacorpus
0
430
Petit déjeuner "Développer sur le cloud, ou comment tout construire à partir de rien" le 11 février - Toulouse
makinacorpus
0
270
CoDe, le programme de développement d'applications mobiles de Makina Corpus
makinacorpus
0
120
Petit déjeuner "Alternatives libres à GoogleMaps" du 11 février 2014 - Nantes - Sylvain Beorchia
makinacorpus
0
670
Petit déjeuner "Les nouveautés de la cartographie en ligne" du 12 décembre
makinacorpus
1
390
Tests carto avec Mocha
makinacorpus
0
820
Other Decks in Technology
See All in Technology
安いGPUレンタルサービスについて
aratako
2
2.6k
AI活用によるPRレビュー改善の歩み ― 社内全体に広がる学びと実践
lycorptech_jp
PRO
1
190
Playwrightのソースコードに見る、自動テストを自動で書く技術
yusukeiwaki
13
5k
AIと二人三脚で育てた、個人開発アプリグロース術
zozotech
PRO
0
690
AWS Trainium3 をちょっと身近に感じたい
bigmuramura
1
130
非CUDAの悲哀 〜Claude Code と挑んだ image to 3D “Hunyuan3D”を EVO-X2(Ryzen AI Max+395)で動作させるチャレンジ〜
hawkymisc
1
160
乗りこなせAI駆動開発の波
eltociear
1
1k
ブロックテーマとこれからの WordPress サイト制作 / Toyama WordPress Meetup Vol.81
torounit
0
510
因果AIへの招待
sshimizu2006
0
930
新 Security HubがついにGA!仕組みや料金を深堀り #AWSreInvent #regrowth / AWS Security Hub Advanced GA
masahirokawahara
1
1.5k
LLM-Readyなデータ基盤を高速に構築するためのアジャイルデータモデリングの実例
kashira
0
210
法人支出管理領域におけるソフトウェアアーキテクチャに基づいたテスト戦略の実践
ogugu9
1
210
Featured
See All Featured
Practical Orchestrator
shlominoach
190
11k
Making Projects Easy
brettharned
120
6.5k
Build The Right Thing And Hit Your Dates
maggiecrowley
38
3k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
7.8k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.8k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
253
22k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.5k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
The Art of Programming - Codeland 2020
erikaheidi
56
14k
Six Lessons from altMBA
skipperchong
29
4.1k
Faster Mobile Websites
deanohume
310
31k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
9
1k
Transcript
Importing Wikipedia in Plone Eric BREHAULT – Plone Conference 2013
ZODB is good at storing objects • Plone contents are
objects, • we store them in the ZODB, • everything is fine, end of the story.
But what if ... ... we want to store non-contentish
records? Like polls, statistics, mail-list subscribers, etc., or any business-specific structured data.
Store them as contents anyway That is a powerfull solution.
But there are 2 major problems...
Problem 1: You need to manage a secondary system •
you need to deploy it, • you need to backup it, • you need to secure it, • etc.
Problem 2: I hate SQL No explanation here.
I think I just cannot digest it...
How to store many records in the ZODB? • Is
the ZODB strong enough? • Is the ZCatalog strong enough?
My grandmother often told me "If you want to become
stronger, you have to eat your soup."
Where do we find a good soup for Plone? In
a super souper!!!
souper.plone and souper • It provides both storage and indexing.
• Record can store any persistent pickable data. • Created by BlueDynamics. • Based on ZODB BTrees, node.ext.zodb, and repoze.catalog.
Add a record >>> soup = get_soup('mysoup', context) >>> record
= Record() >>> record.attrs['user'] = 'user1' >>> record.attrs['text'] = u'foo bar baz' >>> record.attrs['keywords'] = [u'1', u'2', u'ü'] >>> record_id = soup.add(record)
Record in record >>> record['homeaddress'] = Record() >>> record['homeaddress'].attrs['zip'] =
'6020' >>> record['homeaddress'].attrs['town'] = 'Innsbruck' >>> record['homeaddress'].attrs['country'] = 'Austria'
Access record >>> from souper.soup import get_soup >>> soup =
get_soup('mysoup', context) >>> record = soup.get(record_id)
Query >>> from repoze.catalog.query import Eq, Contains >>> [r for
r in soup.query(Eq('user', 'user1') & Contains('text', 'foo'))] [<Record object 'None' at ...>] or using CQE format >>> [r for r in soup.query("user == 'user1' and 'foo' in text")] [<Record object 'None' at ...>]
souper • a Soup-container can be moved to a specific
ZODB mount- point, • it can be shared across multiple independent Plone instances, • souper works on Plone and Pyramid.
Plomino & souper • we use Plomino to build non-content
oriented apps easily, • we use souper to store huge amount of application data.
Plomino data storage Originally, documents (=record) were ATFolder. Capacity about
30 000.
Plomino data storage Since 1.14, documents are pure CMF. Capacity
about 100 000. Usally the Plomino ZCatalog contains a lot of indexes.
Plomino & souper With souper, documents are just soup records.
Capacity: several millions.
Typical use case • Store 500 000 addresses, • Be
able to query them in full text and display the result on a map. Demo
What is the limit? Can we import Wikipedia in souper?
Demo with 400 000 records Demo with 5,5 millions of records
Conclusion • Usage performances are good, • Plone performances are
not impacted. Use it!
Thoughts • What about a REST API on top of
it? • Massive import is long and difficult, could it be improved?
Makina Corpus For all questions related to this talk, please
contact Éric Bréhault
[email protected]
Tel : +33 534 566 958 www.makina-corpus.com