Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Introduction To Hadoop
Search
Marc Cluet
June 18, 2013
Technology
120
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Introduction To Hadoop
Marc Cluet
June 18, 2013
More Decks by Marc Cluet
See All by Marc Cluet
FOSDEM'14 - Autoscaling Best Practices
lynxman
1
120
A metadata ocean in Chef and Puppet
lynxman
0
62
Rackspace Hack Night - Vagrant & Packer
lynxman
0
150
Innovation in the Cloud - Rackspace Zurich Event
lynxman
0
110
Introduction to DevOps - Rackspace Tech Night
lynxman
1
83
SSH That Wonderful Thing
lynxman
1
92
Hadoop Operations
lynxman
0
120
Networking & DNS 101
lynxman
0
100
Juju and Puppet - Rapid Harmonious Deployment
lynxman
0
110
Other Decks in Technology
See All in Technology
自分が詳しくない領域でAIを使う #プロヒス2026
konifar
20
7.5k
FPGAの開発コンペでZephyrを使ってみた
iotengineer22
0
200
2026-06-24_人とAIの責務分離に基づく開発プロセスの提案.pdf
takahiromatsui
0
190
GitHub Copilot app最速の発信の裏側
tomokusaba
1
260
AIチャット検索改善の3週間
kworkdev
PRO
2
180
Lightning近況報告
kozy4324
0
220
AWS Security Hub CSPMの成功・失敗体験
cmusudakeisuke
0
560
Deep Data Security 機能解説
oracle4engineer
PRO
2
160
徹底討論!ECS vs EKS!
daitak
3
1.7k
Flow 不死:AI 時代 DevOps 的不變本質
cheng_wei_chen
2
520
AI-DLCを “そのまま導入しなかった”話 ~組織に合わせてアジャストした 私たちの実践共有~
hiroramos4
PRO
1
430
5分でわかる Amazon Connect_20260608
hwangbyeonghun
0
110
Featured
See All Featured
Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum
chikuwait
0
620
Jamie Indigo - Trashchat’s Guide to Black Boxes: Technical SEO Tactics for LLMs
techseoconnect
PRO
0
190
Done Done
chrislema
186
16k
Primal Persuasion: How to Engage the Brain for Learning That Lasts
tmiket
0
380
Tell your own story through comics
letsgokoyo
1
970
Building Flexible Design Systems
yeseniaperezcruz
330
40k
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
inesmontani
PRO
3
3.5k
Prompt Engineering for Job Search
mfonobong
0
350
Exploring anti-patterns in Rails
aemeredith
3
430
How to Talk to Developers About Accessibility
jct
2
250
The B2B funnel & how to create a winning content strategy
katarinadahlin
PRO
1
400
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.4k
Transcript
Marc Cluet – Lynx Consultants What’s behind Big Data
What we’ll cover? ¡ Understand Hadoop components ¡ Understand
different technologies involved ¡ Embrace Big Data! Lynx Consultants © 2013
What is Big Data? Lynx Consultants © 2013
What is Big Data? ¡ SQL has a limited ability
to process changing data § SQL schemas are the truth, data needs to fit that Lynx Consultants © 2013
What is Big Data? ¡ Big Data is the solution!
§ Data can be truly dynamic Lynx Consultants © 2013
What is Big Data? ¡ Big Data is the solution!
§ Data can be truly dynamic § Designed to handle Terabytes of data Lynx Consultants © 2013
What is Big Data? ¡ Big Data is the solution!
§ Data can be truly dynamic § Designed to handle Terabytes of data § Designed for fault tolerance and securing data Lynx Consultants © 2013
What is Big Data? ¡ Big Data is the solution!
§ Data can be truly dynamic § Designed to handle Terabytes of data § Designed for fault tolerance and securing data § Designed around exploiting hardware to the fullest Lynx Consultants © 2013
What is Big Data? ¡ Big Data is the solution!
§ Data can be truly dynamic § Designed to handle Terabytes of data § Designed for fault tolerance and securing data § Designed around exploiting hardware to the fullest § Designed around Map/Reduce Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
What is Hadoop? Lynx Consultants © 2013
What is Hadoop? ¡ Hadoop is one of the big
players for Big Data § Developed as an Open Source implementation to implement Google BigTable Lynx Consultants © 2013
What is Hadoop? ¡ Hadoop is one of the big
players for Big Data § Developed as an Open Source implementation to implement Google BigTable § Mainly developed at Yahoo! Lynx Consultants © 2013
What is Hadoop? ¡ Hadoop is one of the big
players for Big Data § Developed as an Open Source implementation to implement Google BigTable § Mainly developed at Yahoo! § Current companies behind it: Hortonworks and Cloudera Lynx Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System § HDFS is a distributed filesystem across many nodes § Has many copies of your data (default: 3) § If one node goes down makes sure all the data is rebalanced Lynx Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System Lynx Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System ¡ Hbase – Hadoop NoSQL Database § Schemaless Key-‐Value storage § All data exportable in JSON Lynx Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System ¡ Hbase – Hadoop NoSQL Database Lynx Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System ¡ Hbase – Hadoop NoSQL Database ¡ Map/Reduce – The key to it all § This was invented by Google § Given a dataset we Map all that match a criteria § Then we Reduce this to a result Lynx Consultants © 2013
What are the features of Hadoop? ¡ Map/Reduce – The
key to it all Lynx Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System ¡ Hbase – Hadoop NoSQL Database ¡ Map/Reduce – The key to it all ¡ Hive – SQL for NoSQL § Hive provides a SQL language called HiveSQL § Provides a good entrance for SQL users :) Lynx Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System ¡ Hbase – Hadoop NoSQL Database ¡ Map/Reduce – The key to it all ¡ Hive – SQL for NoSQL ¡ Pig – Map/Reduce made easy § Creates data results given a reduced language § Reinvents SQL somehow Lynx Consultants © 2013
What are the features of Hadoop? ¡ Hive Lynx
Consultants © 2013
What are the features of Hadoop? ¡ Pig Lynx
Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System ¡ Hbase – Hadoop NoSQL Database ¡ Map/Reduce – The key to it all ¡ Hive – SQL for NoSQL ¡ Pig – Map/Reduce made easy ¡ Flume – Fault Tolerant transport Lynx Consultants © 2013
What are the features of Hadoop? ¡ Flume §
Divides in Sources, Channels, Sinks § Can have multiple of everything, makes it fault tolerant § Many sources! ▪ Avro, Exec, JMS, Syslog, HTTP, NetCat, Your Own (Java) Lynx Consultants © 2013
What are the features of Hadoop? ¡ Flume §
Divides in Sources, Channels, Sinks § Can have multiple of everything, makes it fault tolerant § Many sources! § Many channels! ▪ Memory, File, Your Own (Java) Lynx Consultants © 2013
What are the features of Hadoop? ¡ Flume §
Divides in Sources, Channels, Sinks § Can have multiple of everything, makes it fault tolerant § Many sources! § Many channels! § Many sinks! ▪ Avro, HDFS, Logger, IRC, File, Hbase, ElasticSearch, S3, Community sinks, Your Own (Java) Lynx Consultants © 2013
What are the features of Hadoop? ¡ Flume Lynx
Consultants © 2013
How Hadoop looks like in a DC ¡ Components
§ Primary Namenode § Secondary Namenode § Data Node Lynx Consultants © 2013
How Hadoop looks like in a DC ¡ Components
§ Primary Namenode ▪ Controls all the cluster, knows where the data resides ▪ Runs the job tracker to keep track of Map/Reduce jobs ▪ Biggest point of failure, shadowing it is a potential option § Secondary Namenode § Data Node Lynx Consultants © 2013
How Hadoop looks like in a DC ¡ Components
§ Primary Namenode § Secondary Namenode ▪ Performs secondary cleanup options § Data Node Lynx Consultants © 2013
How Hadoop looks like in a DC ¡ Components
§ Primary Namenode § Secondary Namenode § Data Node ▪ Stores all the information ▪ Runs Map/Reduce Lynx Consultants © 2013
How Hadoop looks like in a DC ¡ Components
Lynx Consultants © 2013
Questions? Lynx Consultants © 2013