3+ years as data engineer (big data) Currently Data Engineer in Sale Stock Lead, FB DevC Malang Big Data and JavaScript lover Father or 3 years old son My Steam account hellowin_cavemen
instance, used by our application/ service Everything went well until… We faced 50,000 rows per seconds (18 millions rows per hour) Storage consume more than 100GB each days Single query can takes more than 5 hours
Storage Query problems, is because single machine do it all together } So let’s break it down to multiple machine instead. Machine 1 Data 1 Write Read Machine 2 Data 2 Machine n Data n
2 Data D-F Machine 1 Data A-C Machine n Data nx-ny Read Christine profile Write Dony profile Machine 2 Data Date 6-10 Machine 1 Data Date 1-5 Machine n Data Date nx-ny Scan data date 2-9 Scan data date 6-7
all, quorum, or only one replication machines said succeed Input data 1 Input data 2 Machine 1 Machine 2 Machine 3 Input data 1 Input data 2 Machine 1 Machine 2 Machine 3
Big data platform have partition, replication, and consistency control which make it capable to handle large amount of data. There’s no silver bullet, choose wisely which technology will solve your problems.