Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Importing Wikipedia in Plone
Search
Makina Corpus
October 02, 2013
Technology
1
91
Importing Wikipedia in Plone
Eric BREHAULT – Plone Conference 2013
Makina Corpus
October 02, 2013
Tweet
Share
More Decks by Makina Corpus
See All by Makina Corpus
Publier vos données sur le Web - Forum TIC de l'ATEN 2014
makinacorpus
0
810
Créez votre propre fond de plan à partir de données OSM en utilisant TileMill
makinacorpus
0
140
Team up Django and Web mapping - DjangoCon Europe 2014
makinacorpus
3
890
Petit déjeuner "Les bases de la cartographie sur le Web"
makinacorpus
0
440
Petit déjeuner "Développer sur le cloud, ou comment tout construire à partir de rien" le 11 février - Toulouse
makinacorpus
0
280
CoDe, le programme de développement d'applications mobiles de Makina Corpus
makinacorpus
0
120
Petit déjeuner "Alternatives libres à GoogleMaps" du 11 février 2014 - Nantes - Sylvain Beorchia
makinacorpus
0
680
Petit déjeuner "Les nouveautés de la cartographie en ligne" du 12 décembre
makinacorpus
1
400
Tests carto avec Mocha
makinacorpus
0
830
Other Decks in Technology
See All in Technology
クラウド × シリコンの Mashup - AWS チップ開発で広がる AI 基盤の選択肢
htokoyo
2
180
us-east-1 に障害が起きた時に、 ap-northeast-1 にどんな影響があるか 説明できるようになろう!
miu_crescent
PRO
13
4.2k
2026-03-11 JAWS-UG 茨城 #12 改めてALBを便利に使う
masasuzu
2
340
ナレッジワーク IT情報系キャリア研究セッション資料(情報処理学会 第88回全国大会 )
kworkdev
PRO
0
160
S3はフラットである –AWS公式SDKにも存在した、 署名付きURLにおけるパストラバーサル脆弱性– / JAWS DAYS 2026
flatt_security
0
1.7k
白金鉱業Meetup_Vol.22_Orbital Senseを支える衛星画像のマルチモーダルエンベディングと地理空間のあいまい検索技術
brainpadpr
2
290
NewSQL_ ストレージ分離と分散合意を用いたスケーラブルアーキテクチャ
hacomono
PRO
1
170
JAWSDAYS2026_A-6_現場SEが語る 回せるセキュリティ運用~設計で可視化、AIで加速する「楽に回る」運用設計のコツ~
shoki_hata
0
3k
マルチロールEMが実践する「組織のレジリエンス」を高めるための組織構造と人材配置戦略
coconala_engineer
3
710
情シスのための生成AI実践ガイド2026 / Generative AI Practical Guide for Business Technology 2026
glidenote
0
190
Ultra Ethernet (UEC) v1.0 仕様概説
markunet
3
250
Datadog の RBAC のすべて
nulabinc
PRO
3
440
Featured
See All Featured
Leveraging Curiosity to Care for An Aging Population
cassininazir
1
190
Prompt Engineering for Job Search
mfonobong
0
180
Navigating the moral maze — ethical principles for Al-driven product design
skipperchong
2
280
Amusing Abliteration
ianozsvald
0
130
Statistics for Hackers
jakevdp
799
230k
Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum
chikuwait
0
390
We Have a Design System, Now What?
morganepeng
55
8k
Collaborative Software Design: How to facilitate domain modelling decisions
baasie
0
150
GraphQLとの向き合い方2022年版
quramy
50
14k
Ruling the World: When Life Gets Gamed
codingconduct
0
170
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
Skip the Path - Find Your Career Trail
mkilby
1
75
Transcript
Importing Wikipedia in Plone Eric BREHAULT – Plone Conference 2013
ZODB is good at storing objects • Plone contents are
objects, • we store them in the ZODB, • everything is fine, end of the story.
But what if ... ... we want to store non-contentish
records? Like polls, statistics, mail-list subscribers, etc., or any business-specific structured data.
Store them as contents anyway That is a powerfull solution.
But there are 2 major problems...
Problem 1: You need to manage a secondary system •
you need to deploy it, • you need to backup it, • you need to secure it, • etc.
Problem 2: I hate SQL No explanation here.
I think I just cannot digest it...
How to store many records in the ZODB? • Is
the ZODB strong enough? • Is the ZCatalog strong enough?
My grandmother often told me "If you want to become
stronger, you have to eat your soup."
Where do we find a good soup for Plone? In
a super souper!!!
souper.plone and souper • It provides both storage and indexing.
• Record can store any persistent pickable data. • Created by BlueDynamics. • Based on ZODB BTrees, node.ext.zodb, and repoze.catalog.
Add a record >>> soup = get_soup('mysoup', context) >>> record
= Record() >>> record.attrs['user'] = 'user1' >>> record.attrs['text'] = u'foo bar baz' >>> record.attrs['keywords'] = [u'1', u'2', u'ü'] >>> record_id = soup.add(record)
Record in record >>> record['homeaddress'] = Record() >>> record['homeaddress'].attrs['zip'] =
'6020' >>> record['homeaddress'].attrs['town'] = 'Innsbruck' >>> record['homeaddress'].attrs['country'] = 'Austria'
Access record >>> from souper.soup import get_soup >>> soup =
get_soup('mysoup', context) >>> record = soup.get(record_id)
Query >>> from repoze.catalog.query import Eq, Contains >>> [r for
r in soup.query(Eq('user', 'user1') & Contains('text', 'foo'))] [<Record object 'None' at ...>] or using CQE format >>> [r for r in soup.query("user == 'user1' and 'foo' in text")] [<Record object 'None' at ...>]
souper • a Soup-container can be moved to a specific
ZODB mount- point, • it can be shared across multiple independent Plone instances, • souper works on Plone and Pyramid.
Plomino & souper • we use Plomino to build non-content
oriented apps easily, • we use souper to store huge amount of application data.
Plomino data storage Originally, documents (=record) were ATFolder. Capacity about
30 000.
Plomino data storage Since 1.14, documents are pure CMF. Capacity
about 100 000. Usally the Plomino ZCatalog contains a lot of indexes.
Plomino & souper With souper, documents are just soup records.
Capacity: several millions.
Typical use case • Store 500 000 addresses, • Be
able to query them in full text and display the result on a map. Demo
What is the limit? Can we import Wikipedia in souper?
Demo with 400 000 records Demo with 5,5 millions of records
Conclusion • Usage performances are good, • Plone performances are
not impacted. Use it!
Thoughts • What about a REST API on top of
it? • Massive import is long and difficult, could it be improved?
Makina Corpus For all questions related to this talk, please
contact Éric Bréhault
[email protected]
Tel : +33 534 566 958 www.makina-corpus.com