Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Introduction To Hadoop
Search
Marc Cluet
June 18, 2013
Technology
120
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Introduction To Hadoop
Marc Cluet
June 18, 2013
More Decks by Marc Cluet
See All by Marc Cluet
FOSDEM'14 - Autoscaling Best Practices
lynxman
1
120
A metadata ocean in Chef and Puppet
lynxman
0
62
Rackspace Hack Night - Vagrant & Packer
lynxman
0
150
Innovation in the Cloud - Rackspace Zurich Event
lynxman
0
110
Introduction to DevOps - Rackspace Tech Night
lynxman
1
83
SSH That Wonderful Thing
lynxman
1
92
Hadoop Operations
lynxman
0
120
Networking & DNS 101
lynxman
0
100
Juju and Puppet - Rapid Harmonious Deployment
lynxman
0
110
Other Decks in Technology
See All in Technology
【セミナー資料】Claude Code をセキュアに使うための考え方と設定の勘どころ / Claude Code Webinar 20260616
masahirokawahara
2
470
[AWS Summit Japan 2026]迷っているあなたへ_小さな一歩が、やがて自分を助けてくれる
sh_fk2
2
420
AIをフル活用してオンコール機能のプロトタイプを2日で作った話 / Building an AI-Powered On-Call Prototype in Just Two Days
nari_ex
0
140
AIはどのように 組織のアジリティを変えるのか?
junki
4
1.4k
2026-06-24_人とAIの責務分離に基づく開発プロセスの提案.pdf
takahiromatsui
0
180
螺旋型キャリアの生存戦略 / kinoko-conf2026
rakus_dev
1
1k
「軸足」は 固定しなくていい - 熱量と強みで描く、しなやかなキャリアの形
kakehashi
PRO
1
270
不要なレビューをAIにまかせて AIコーディングの環境改善を加速した
shoota
1
270
AIチャットの改善から見えた、良いAI体験とは / What Constitutes a Good AI Experience: Insights from Improving AI Chat
kubode
0
120
BPaaSで進むAIオペレーションの現在地 AI実装が効く領域とスケーラビリティの選定と実装
kentarofujii
0
200
Oracle Cloud Infrastructure:2026年6月度サービス・アップデート
oracle4engineer
PRO
0
330
iOS アプリの「これって不具合ですか?」を AI に調べてもらう
miichan
0
140
Featured
See All Featured
4 Signs Your Business is Dying
shpigford
187
22k
The Cult of Friendly URLs
andyhume
79
6.9k
Reflections from 52 weeks, 52 projects
jeffersonlam
356
21k
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
170
Measuring Dark Social's Impact On Conversion and Attribution
stephenakadiri
2
220
Avoiding the “Bad Training, Faster” Trap in the Age of AI
tmiket
0
180
Unsuck your backbone
ammeep
672
58k
Lightning talk: Run Django tests with GitHub Actions
sabderemane
0
200
Everyday Curiosity
cassininazir
0
240
Become a Pro
speakerdeck
PRO
31
6k
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.3k
For a Future-Friendly Web
brad_frost
183
10k
Transcript
Marc Cluet – Lynx Consultants What’s behind Big Data
What we’ll cover? ¡ Understand Hadoop components ¡ Understand
different technologies involved ¡ Embrace Big Data! Lynx Consultants © 2013
What is Big Data? Lynx Consultants © 2013
What is Big Data? ¡ SQL has a limited ability
to process changing data § SQL schemas are the truth, data needs to fit that Lynx Consultants © 2013
What is Big Data? ¡ Big Data is the solution!
§ Data can be truly dynamic Lynx Consultants © 2013
What is Big Data? ¡ Big Data is the solution!
§ Data can be truly dynamic § Designed to handle Terabytes of data Lynx Consultants © 2013
What is Big Data? ¡ Big Data is the solution!
§ Data can be truly dynamic § Designed to handle Terabytes of data § Designed for fault tolerance and securing data Lynx Consultants © 2013
What is Big Data? ¡ Big Data is the solution!
§ Data can be truly dynamic § Designed to handle Terabytes of data § Designed for fault tolerance and securing data § Designed around exploiting hardware to the fullest Lynx Consultants © 2013
What is Big Data? ¡ Big Data is the solution!
§ Data can be truly dynamic § Designed to handle Terabytes of data § Designed for fault tolerance and securing data § Designed around exploiting hardware to the fullest § Designed around Map/Reduce Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
What is Hadoop? Lynx Consultants © 2013
What is Hadoop? ¡ Hadoop is one of the big
players for Big Data § Developed as an Open Source implementation to implement Google BigTable Lynx Consultants © 2013
What is Hadoop? ¡ Hadoop is one of the big
players for Big Data § Developed as an Open Source implementation to implement Google BigTable § Mainly developed at Yahoo! Lynx Consultants © 2013
What is Hadoop? ¡ Hadoop is one of the big
players for Big Data § Developed as an Open Source implementation to implement Google BigTable § Mainly developed at Yahoo! § Current companies behind it: Hortonworks and Cloudera Lynx Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System § HDFS is a distributed filesystem across many nodes § Has many copies of your data (default: 3) § If one node goes down makes sure all the data is rebalanced Lynx Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System Lynx Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System ¡ Hbase – Hadoop NoSQL Database § Schemaless Key-‐Value storage § All data exportable in JSON Lynx Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System ¡ Hbase – Hadoop NoSQL Database Lynx Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System ¡ Hbase – Hadoop NoSQL Database ¡ Map/Reduce – The key to it all § This was invented by Google § Given a dataset we Map all that match a criteria § Then we Reduce this to a result Lynx Consultants © 2013
What are the features of Hadoop? ¡ Map/Reduce – The
key to it all Lynx Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System ¡ Hbase – Hadoop NoSQL Database ¡ Map/Reduce – The key to it all ¡ Hive – SQL for NoSQL § Hive provides a SQL language called HiveSQL § Provides a good entrance for SQL users :) Lynx Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System ¡ Hbase – Hadoop NoSQL Database ¡ Map/Reduce – The key to it all ¡ Hive – SQL for NoSQL ¡ Pig – Map/Reduce made easy § Creates data results given a reduced language § Reinvents SQL somehow Lynx Consultants © 2013
What are the features of Hadoop? ¡ Hive Lynx
Consultants © 2013
What are the features of Hadoop? ¡ Pig Lynx
Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System ¡ Hbase – Hadoop NoSQL Database ¡ Map/Reduce – The key to it all ¡ Hive – SQL for NoSQL ¡ Pig – Map/Reduce made easy ¡ Flume – Fault Tolerant transport Lynx Consultants © 2013
What are the features of Hadoop? ¡ Flume §
Divides in Sources, Channels, Sinks § Can have multiple of everything, makes it fault tolerant § Many sources! ▪ Avro, Exec, JMS, Syslog, HTTP, NetCat, Your Own (Java) Lynx Consultants © 2013
What are the features of Hadoop? ¡ Flume §
Divides in Sources, Channels, Sinks § Can have multiple of everything, makes it fault tolerant § Many sources! § Many channels! ▪ Memory, File, Your Own (Java) Lynx Consultants © 2013
What are the features of Hadoop? ¡ Flume §
Divides in Sources, Channels, Sinks § Can have multiple of everything, makes it fault tolerant § Many sources! § Many channels! § Many sinks! ▪ Avro, HDFS, Logger, IRC, File, Hbase, ElasticSearch, S3, Community sinks, Your Own (Java) Lynx Consultants © 2013
What are the features of Hadoop? ¡ Flume Lynx
Consultants © 2013
How Hadoop looks like in a DC ¡ Components
§ Primary Namenode § Secondary Namenode § Data Node Lynx Consultants © 2013
How Hadoop looks like in a DC ¡ Components
§ Primary Namenode ▪ Controls all the cluster, knows where the data resides ▪ Runs the job tracker to keep track of Map/Reduce jobs ▪ Biggest point of failure, shadowing it is a potential option § Secondary Namenode § Data Node Lynx Consultants © 2013
How Hadoop looks like in a DC ¡ Components
§ Primary Namenode § Secondary Namenode ▪ Performs secondary cleanup options § Data Node Lynx Consultants © 2013
How Hadoop looks like in a DC ¡ Components
§ Primary Namenode § Secondary Namenode § Data Node ▪ Stores all the information ▪ Runs Map/Reduce Lynx Consultants © 2013
How Hadoop looks like in a DC ¡ Components
Lynx Consultants © 2013
Questions? Lynx Consultants © 2013