Searching over streams with Luwak and Samza

Martin Kleppmann

January 31, 2015

760

Searching over streams with Luwak and Samza

Talk co-presented with Alan Woodward at FOSDEM, Brussels, Belgium, on 31 January 2015. http://martin.kleppmann.com/2015/01/31/searching-over-streams-at-fosdem.html

Abstract:

Real-time searching over streams is useful in a number of contexts. For example, companies may want to detect whenever they are mentioned in a news feed; or a Twitter user might want to see a continuous stream of tweets for a particular hashtag.

Luwak (https://github.com/flaxsearch/luwak) provides a mechanism for running many thousands of queries over a single document in a highly efficient manner, by filtering out queries that it can detect will not match. Luwak is designed to run on a single node, holding all registered queries in RAM. Scaling to higher document throughput, or to more queries, requires parallelization across multiple machines.

Samza (http://samza.apache.org/) provides a framework for such parallelization, by partitioning and recombining both the document streams and the query set (which can be treated as just another stream), and also provides fault-tolerance mechanisms that allows swift recovery from machine failure, without losing documents or queries.

Martin Kleppmann

January 31, 2015

Tweet

More Decks by Martin Kleppmann

See All by Martin Kleppmann

Collaborative text editing with Eg-walker: Better, faster, smaller

0

580

Byzantine Eventual Consistency and Local-first Access Control

0

810

The past, present, and future of local-first

0

2.4k

Where local-first came from and where it's going

0

4.5k

Byzantine fault tolerance for peer-to-peer collaboration

0

1.3k

New algorithms for collaborative text editing

0

1.3k

Creating local-first collaboration software with Automerge

0

2.9k

Collaborative editing through a databases lens

0

2.5k

Making CRDTs Byzantine fault tolerant

0

3k

Other Decks in Programming

See All in Programming

実践 Dev Containers × Claude Code

1

210

【第4回】関東Kaggler会「Kaggleは執筆に役立つ」

0

420

AHC051解法紹介

0

590

Dart 参戦！！静的型付き言語界の隠れた実力者

0

200

Webinar: AI-Powered Development: Transformiere deinen Workflow mit Coding Tools und MCP Servern

0

130

The State of Fluid (2025)

0

170

バイブコーディングの正体——AIエージェントはソフトウェア開発を変えるか？

5

960

AIエージェント開発、DevOps and LLMOps

1

170

書き捨てではなく継続開発可能なコードをAIコーディングエージェントで書くために意識していること

1

280

生成AI、実際どう？ - ニーリーの場合

0

130

JetBrainsのAI機能の紹介 #jjug

0

210

新世界の理解

0

140

Featured

See All Featured

Bash Introduction

614

210k

"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)

229

22k

Writing Fast Ruby

628

62k

The Cult of Friendly URLs

79

6.5k

VelocityConf: Rendering Performance Case Studies

332

24k

[RailsConf 2023 Opening Keynote] The Magic of Rails

30

9.6k

Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything

31

2.5k

The Straight Up "How To Draw Better" Workshop

236

140k

Easily Structure & Communicate Ideas using Wireframe

194

16k

Unsuck your backbone

671

58k

XXLCSS - How to scale CSS and keep your sanity

248

1.3M

Understanding Cognitive Biases in Performance Measurement

29

1.8k

Transcript