Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[NYJavaSig] Riding The Distributed Streams
Search
Viktor Gamov
February 03, 2017
Technology
1
190
[NYJavaSig] Riding The Distributed Streams
Presentation on Hazelcast and Distributed Streams.
Presented on NYJavaSig
Viktor Gamov
February 03, 2017
Tweet
Share
More Decks by Viktor Gamov
See All by Viktor Gamov
Processing Streaming Data with KSQL
vikgamov
4
330
[VirtualJUG] Apache Kafka — A Streaming Data Platform
vikgamov
3
330
[SF JUG] Apache Kafka — A Streaming Data Platform
vikgamov
4
72
[OracleCode NYC-2018] Apache Kafka A Streaming Data Platform
vikgamov
1
150
[OracleCode NYC-2018] Rethinking Stream Processing with KStreams and KSQL
vikgamov
2
210
[JBreak-2018] Это кто там твитить про #jbreak?
vikgamov
0
170
[DevNexus-2018] Apache Kafka A Streaming Data Platform
vikgamov
2
240
[DataSciCon] Divide, Distribute and Conquer: Stream v. Batch
vikgamov
0
94
[Philly JUG] Divide, Distribute and Conquer: Stream v. Batch
vikgamov
0
440
Other Decks in Technology
See All in Technology
Apple/Google/Amazonの決済システムの違いを踏まえた定期購読課金システムの構築 / abema-billing-system
cyberagentdevelopers
PRO
1
220
10分でわかるfreee エンジニア向け会社説明資料
freee
18
520k
Autify Company Deck
autifyhq
1
39k
最速最小からはじめるデータプロダクト / Data Product MVP
amaotone
5
740
いまさらのStorybook
ikumatadokoro
0
110
ガバメントクラウド単独利用方式におけるIaC活用
techniczna
3
270
新卒1年目が挑む!生成AI × マルチエージェントで実現する次世代オンボーディング / operation-ai-onboarding
cyberagentdevelopers
PRO
1
160
わたしとトラックポイント / TrackPoint tips
masahirokawahara
1
240
Gradle: The Build System That Loves To Hate You
aurimas
2
150
Product Engineer Night #6プロダクトエンジニアを育む仕組み・施策
hacomono
PRO
1
470
ExaDB-D dbaascli で出来ること
oracle4engineer
PRO
0
3.6k
【技術書典17】OpenFOAM(自宅で極める流体解析)2次元円柱まわりの流れ
kamakiri1225
0
210
Featured
See All Featured
What's new in Ruby 2.0
geeforr
342
31k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
9
680
Stop Working from a Prison Cell
hatefulcrawdad
267
20k
Building Better People: How to give real-time feedback that sticks.
wjessup
363
19k
Rails Girls Zürich Keynote
gr2m
93
13k
How to train your dragon (web standard)
notwaldorf
88
5.7k
Music & Morning Musume
bryan
46
6.1k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
26
1.4k
Fontdeck: Realign not Redesign
paulrobertlloyd
81
5.2k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
31
2.7k
How To Stay Up To Date on Web Technology
chriscoyier
788
250k
Intergalactic Javascript Robots from Outer Space
tanoku
268
27k
Transcript
None
> whoami • Solutions Architect @Hazelcast • Hang out with
awesome people • @gamussa in internetz Please, follow me in Twitter I’m very interesting ©
Agenda • Refreshing knowledge on Java 8 Streams • Distribute
and Conquer • Distributed Data • Distributed Streams • How we did all this
Java 8 Streams
Java 8 Streams… • An abstraction represents a sequence of
elements • Is not a data structure • Convey elements from a source through a pipeline of operations • Operation doesn’t modify a source
Why I should care about Stream API? • You’re Java
developer
What does regular Java developer think about Scala? advanced
Why I should care about Stream API? • You’re Java
developer • Many Java developers know Java • It’s all about data processing
java.util.stream operations • map(), flatMap(), filter() • reduce(), collect() •
sorted()
None
None
None
Problem • One does not simply put all Big Data
in one machine
Problem • Data doesn’t fit just one machine
Problem • One does not simply put all Big Data
in one machine • Data is too important to have it only one machine
None
CACHES
Replication on Sharding? http://book.mixu.net/distsys/single-page.html
Solution • Use Distributed Map aka IMap
What’s Hazelcast IMDG? • In-memory Data Grid • Apache v2
Licensed • Distributed • Caches (IMap, JCache) • Java Collections (IList, ISet, IQueue) • Messaging (Topic, RingBuffer) • Computation (ExecutorService, M-R)
None
None
None
Green Primary Green Backup Green Shard
None
Problem • Lambda serialization 26
27
Solution • serializable version of the interfaces • Introducing DistributedStream
28
29
None
31 Jet Streams
None
What’s Hazelcast Jet? • General purpose distributed data processing framework
• Based on Direct Acyclic Graph to model data flow • Built on top of Hazelcast IMDG • Comparable to Apache Spark or Apache Flink 33
None
DAG 35
Job Execution 36
None
Future (It’s bright!) • Memory module for processing big data
• Higher level streaming and batching APIs • Reactive Streams • Distributed Classloading • Integrations (HDFS/Yarn/Mesos)
Your fuel, our Jet Engine • Public release – Feb
7th. • Developer Preview today - yay! • http://hazelcast.org/jet-signup • Send me a note
[email protected]
• Follow @hazelcast and @gamussa (duh!!) • Your questions #hazelcast #hazelcastjet
Conclusion • Java Stream API provides very white range of
data processing tools • War And Piece – is a Big (a lot of data) Book! • Now we’re pretty sure that Andrew and Pierre are the main characters
None