$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[NYJavaSig] Riding The Distributed Streams
Search
Viktor Gamov
February 03, 2017
Technology
1
200
[NYJavaSig] Riding The Distributed Streams
Presentation on Hazelcast and Distributed Streams.
Presented on NYJavaSig
Viktor Gamov
February 03, 2017
Tweet
Share
More Decks by Viktor Gamov
See All by Viktor Gamov
Processing Streaming Data with KSQL
vikgamov
4
390
[VirtualJUG] Apache Kafka — A Streaming Data Platform
vikgamov
3
390
[SF JUG] Apache Kafka — A Streaming Data Platform
vikgamov
4
89
[OracleCode NYC-2018] Apache Kafka A Streaming Data Platform
vikgamov
1
180
[OracleCode NYC-2018] Rethinking Stream Processing with KStreams and KSQL
vikgamov
2
230
[JBreak-2018] Это кто там твитить про #jbreak?
vikgamov
0
220
[DevNexus-2018] Apache Kafka A Streaming Data Platform
vikgamov
2
300
[DataSciCon] Divide, Distribute and Conquer: Stream v. Batch
vikgamov
0
110
[Philly JUG] Divide, Distribute and Conquer: Stream v. Batch
vikgamov
0
480
Other Decks in Technology
See All in Technology
Playwright x GitHub Actionsで実現する「レビューしやすい」E2Eテストレポート
kinosuke01
0
320
モバイルゲーム開発におけるエージェント技術活用への試行錯誤 ~開発効率化へのアプローチの紹介と未来に向けた展望~
qualiarts
0
650
生成AIでテスト設計はどこまでできる? 「テスト粒度」を操るテーラリング術
shota_kusaba
0
440
AI活用によるPRレビュー改善の歩み ― 社内全体に広がる学びと実践
lycorptech_jp
PRO
1
180
5分で知るMicrosoft Ignite
taiponrock
PRO
0
170
Debugging Edge AI on Zephyr and Lessons Learned
iotengineer22
0
110
学習データって増やせばいいんですか?
ftakahashi
1
160
エンジニアリングマネージャー はじめての目標設定と評価
halkt
0
250
Oracle Technology Night #95 GoldenGate 26ai の実装に迫る1
oracle4engineer
PRO
0
150
なぜ使われないのか?──定量×定性で見極める本当のボトルネック
kakehashi
PRO
1
1.2k
[JAWS-UG 横浜支部 #91]DevOps Agent vs CloudWatch Investigations -比較と実践-
sh_fk2
1
240
大企業でもできる!ボトムアップで拡大させるプラットフォームの作り方
findy_eventslides
1
500
Featured
See All Featured
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
15k
Product Roadmaps are Hard
iamctodd
PRO
55
12k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
121
20k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
How GitHub (no longer) Works
holman
316
140k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
32
2.7k
Optimizing for Happiness
mojombo
379
70k
Git: the NoSQL Database
bkeepers
PRO
432
66k
The World Runs on Bad Software
bkeepers
PRO
72
12k
YesSQL, Process and Tooling at Scale
rocio
174
15k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
36
6.2k
The Language of Interfaces
destraynor
162
25k
Transcript
None
> whoami • Solutions Architect @Hazelcast • Hang out with
awesome people • @gamussa in internetz Please, follow me in Twitter I’m very interesting ©
Agenda • Refreshing knowledge on Java 8 Streams • Distribute
and Conquer • Distributed Data • Distributed Streams • How we did all this
Java 8 Streams
Java 8 Streams… • An abstraction represents a sequence of
elements • Is not a data structure • Convey elements from a source through a pipeline of operations • Operation doesn’t modify a source
Why I should care about Stream API? • You’re Java
developer
What does regular Java developer think about Scala? advanced
Why I should care about Stream API? • You’re Java
developer • Many Java developers know Java • It’s all about data processing
java.util.stream operations • map(), flatMap(), filter() • reduce(), collect() •
sorted()
None
None
None
Problem • One does not simply put all Big Data
in one machine
Problem • Data doesn’t fit just one machine
Problem • One does not simply put all Big Data
in one machine • Data is too important to have it only one machine
None
CACHES
Replication on Sharding? http://book.mixu.net/distsys/single-page.html
Solution • Use Distributed Map aka IMap
What’s Hazelcast IMDG? • In-memory Data Grid • Apache v2
Licensed • Distributed • Caches (IMap, JCache) • Java Collections (IList, ISet, IQueue) • Messaging (Topic, RingBuffer) • Computation (ExecutorService, M-R)
None
None
None
Green Primary Green Backup Green Shard
None
Problem • Lambda serialization 26
27
Solution • serializable version of the interfaces • Introducing DistributedStream
28
29
None
31 Jet Streams
None
What’s Hazelcast Jet? • General purpose distributed data processing framework
• Based on Direct Acyclic Graph to model data flow • Built on top of Hazelcast IMDG • Comparable to Apache Spark or Apache Flink 33
None
DAG 35
Job Execution 36
None
Future (It’s bright!) • Memory module for processing big data
• Higher level streaming and batching APIs • Reactive Streams • Distributed Classloading • Integrations (HDFS/Yarn/Mesos)
Your fuel, our Jet Engine • Public release – Feb
7th. • Developer Preview today - yay! • http://hazelcast.org/jet-signup • Send me a note
[email protected]
• Follow @hazelcast and @gamussa (duh!!) • Your questions #hazelcast #hazelcastjet
Conclusion • Java Stream API provides very white range of
data processing tools • War And Piece – is a Big (a lot of data) Book! • Now we’re pretty sure that Andrew and Pierre are the main characters
None