Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Importing Wikipedia in Plone
Search
Makina Corpus
October 02, 2013
Technology
1
91
Importing Wikipedia in Plone
Eric BREHAULT – Plone Conference 2013
Makina Corpus
October 02, 2013
Tweet
Share
More Decks by Makina Corpus
See All by Makina Corpus
Publier vos données sur le Web - Forum TIC de l'ATEN 2014
makinacorpus
0
800
Créez votre propre fond de plan à partir de données OSM en utilisant TileMill
makinacorpus
0
140
Team up Django and Web mapping - DjangoCon Europe 2014
makinacorpus
3
890
Petit déjeuner "Les bases de la cartographie sur le Web"
makinacorpus
0
440
Petit déjeuner "Développer sur le cloud, ou comment tout construire à partir de rien" le 11 février - Toulouse
makinacorpus
0
280
CoDe, le programme de développement d'applications mobiles de Makina Corpus
makinacorpus
0
120
Petit déjeuner "Alternatives libres à GoogleMaps" du 11 février 2014 - Nantes - Sylvain Beorchia
makinacorpus
0
680
Petit déjeuner "Les nouveautés de la cartographie en ligne" du 12 décembre
makinacorpus
1
400
Tests carto avec Mocha
makinacorpus
0
830
Other Decks in Technology
See All in Technology
AI が Approve する開発フロー / How AI Reviewers Accelerate Our Development
zaimy
1
190
インシデント対応入門
grimoh
7
5.1k
[続・営業向け 誰でも話せるOCI セールストーク] AWSよりOCIの優位性が分からない編(2026年2月20日開催)
oracle4engineer
PRO
0
100
Amazon Bedrock AgentCoreでブラウザ拡張型AI調査エージェントを開発した話 (シングルエージェント編)
nasuvitz
2
110
AI時代のAPIファースト開発
nagix
1
530
React 19時代のコンポーネント設計ベストプラクティス
uhyo
17
6.8k
「静的解析」だけで終わらせない。 SonarQube の最新機能 × AIで エンジニアの開発生産性を本気で上げる方法
xibuka
2
270
Intro SAGA Event Space
midnight480
0
150
意志を実装するアーキテクチャモダナイゼーション
nwiizo
3
1.7k
EMから現場に戻って見えた2026年の開発者視点
sudoakiy
1
420
What's new in Go 1.26?
ciarana
2
160
歴史に敬意を! パラシュートVPoEが組織と共同で立ち上がる信頼醸成オンボーディング
go0517go
PRO
0
190
Featured
See All Featured
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.3k
The browser strikes back
jonoalderson
0
730
Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum
chikuwait
0
370
How To Stay Up To Date on Web Technology
chriscoyier
791
250k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
YesSQL, Process and Tooling at Scale
rocio
174
15k
Building an army of robots
kneath
306
46k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
133
19k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.3k
A Modern Web Designer's Workflow
chriscoyier
698
190k
What Being in a Rock Band Can Teach Us About Real World SEO
427marketing
0
180
Transcript
Importing Wikipedia in Plone Eric BREHAULT – Plone Conference 2013
ZODB is good at storing objects • Plone contents are
objects, • we store them in the ZODB, • everything is fine, end of the story.
But what if ... ... we want to store non-contentish
records? Like polls, statistics, mail-list subscribers, etc., or any business-specific structured data.
Store them as contents anyway That is a powerfull solution.
But there are 2 major problems...
Problem 1: You need to manage a secondary system •
you need to deploy it, • you need to backup it, • you need to secure it, • etc.
Problem 2: I hate SQL No explanation here.
I think I just cannot digest it...
How to store many records in the ZODB? • Is
the ZODB strong enough? • Is the ZCatalog strong enough?
My grandmother often told me "If you want to become
stronger, you have to eat your soup."
Where do we find a good soup for Plone? In
a super souper!!!
souper.plone and souper • It provides both storage and indexing.
• Record can store any persistent pickable data. • Created by BlueDynamics. • Based on ZODB BTrees, node.ext.zodb, and repoze.catalog.
Add a record >>> soup = get_soup('mysoup', context) >>> record
= Record() >>> record.attrs['user'] = 'user1' >>> record.attrs['text'] = u'foo bar baz' >>> record.attrs['keywords'] = [u'1', u'2', u'ü'] >>> record_id = soup.add(record)
Record in record >>> record['homeaddress'] = Record() >>> record['homeaddress'].attrs['zip'] =
'6020' >>> record['homeaddress'].attrs['town'] = 'Innsbruck' >>> record['homeaddress'].attrs['country'] = 'Austria'
Access record >>> from souper.soup import get_soup >>> soup =
get_soup('mysoup', context) >>> record = soup.get(record_id)
Query >>> from repoze.catalog.query import Eq, Contains >>> [r for
r in soup.query(Eq('user', 'user1') & Contains('text', 'foo'))] [<Record object 'None' at ...>] or using CQE format >>> [r for r in soup.query("user == 'user1' and 'foo' in text")] [<Record object 'None' at ...>]
souper • a Soup-container can be moved to a specific
ZODB mount- point, • it can be shared across multiple independent Plone instances, • souper works on Plone and Pyramid.
Plomino & souper • we use Plomino to build non-content
oriented apps easily, • we use souper to store huge amount of application data.
Plomino data storage Originally, documents (=record) were ATFolder. Capacity about
30 000.
Plomino data storage Since 1.14, documents are pure CMF. Capacity
about 100 000. Usally the Plomino ZCatalog contains a lot of indexes.
Plomino & souper With souper, documents are just soup records.
Capacity: several millions.
Typical use case • Store 500 000 addresses, • Be
able to query them in full text and display the result on a map. Demo
What is the limit? Can we import Wikipedia in souper?
Demo with 400 000 records Demo with 5,5 millions of records
Conclusion • Usage performances are good, • Plone performances are
not impacted. Use it!
Thoughts • What about a REST API on top of
it? • Massive import is long and difficult, could it be improved?
Makina Corpus For all questions related to this talk, please
contact Éric Bréhault
[email protected]
Tel : +33 534 566 958 www.makina-corpus.com