Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AWSにおけるデータ分析入門 / Introduction To Data Analytic...
Search
hedgehog051
October 06, 2021
0
210
AWSにおけるデータ分析入門 / Introduction To Data Analytics In AWS
hedgehog051
October 06, 2021
Tweet
Share
More Decks by hedgehog051
See All by hedgehog051
AWS Generative AI CDK Constructsについて
hedgehog051
2
240
KnowledgeBasesとAgentsの紹介
hedgehog051
4
1.7k
BedrockUpdatesPost-GW Summary
hedgehog051
4
720
来てくれClaude 3! Agents for Amazon Bedrockのモデル比較或いはチューニングの話
hedgehog051
5
1.6k
Relic_Tech_Camp_GenerativeAI.pdf
hedgehog051
11
87k
concurrencyで爆速並列デプロイ
hedgehog051
1
1.7k
AWS App Runnerについてとこれから期待したいこと/About-AWS-App-Runner-and-what-to-expect-in-the-future
hedgehog051
0
81
また増えた!?AWSコンテナ関連サービスを10分でざっくり掴もう/Learn-about-AWS-0container-services-in-10-minutes
hedgehog051
0
94
Featured
See All Featured
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
160
15k
Build The Right Thing And Hit Your Dates
maggiecrowley
35
2.7k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
13
820
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.7k
Code Review Best Practice
trishagee
67
18k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
31
1.2k
A better future with KSS
kneath
239
17k
Testing 201, or: Great Expectations
jmmastey
42
7.5k
Embracing the Ebb and Flow
colly
85
4.7k
Navigating Team Friction
lara
185
15k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
60k
Docker and Python
trallard
44
3.4k
Transcript
"8 4 ʹ ͓ ͚ Δ σ ʔ λ
ੳ ೖ ג ࣜ ձ ࣾ R e l i c ۽ ా
ࣗݾհ • ۽ా ,BO,VNBEB • ळdΠϯϑϥΤϯδχΞ • ݄ʹגࣜձࣾ3FMJDೖࣾ
σʔλੳ͕͍ͨ͠ʜ
ϏδωεΛΠϯςϦδΣϯε͍ͨ͠ʜ
ʑσʔλੳͷػӡߴ·Δ
ͦͷલʹ
• ݁Ռ࣮ͳͲͷσʔλΛऩू ˞زΒചΓ্͔͛ͨɺͲΕ͘Β͍ΞΫηε͕͔͋ͬͨͳͲ • ऩूͨ͠σʔλ͔ΒԿ͔͠ΒͷΠϯαΠτΛಘΔ ˞Կ͕ചΕ͍ͯΔ͔ɺ͍ͭɺ୭ʹചΕ͍ͯΔ͔ͳͲ • ಘΒΕͨΠϯαΠτʹରͯ͠ΞΫγϣϯΛى͜͢ ˞Ձ֨ΛௐɺදࣔΛௐɺλʔήοτ֦େͳͲ
σʔλੳͬͯԿ͢Δͷ
ԿΛ࣮ݱͨͯ͘͠σʔλੳΛ ͢Δͷ͔Λ໌֬ʹ͢Δͷ͕େࣄ
"84Ͱͷσʔλੳؔ࿈αʔϏε
ͳΔ΄ͲɺΘ͔ΒΜ
• ݁Ռ࣮ͳͲͷσʔλΛऩू ˠͲ͏ͬͯूΊΔ͔ɺԿॲʹूΊΔ͔ • ऩूͨ͠σʔλ͔ΒԿ͔͠ΒͷΠϯαΠτΛಘΔ ˠੳ͘͢͠ՃɺੳɺՄࢹԽ σʔλੳج൫Λߏங͢Δʹ͋ͨͬͯ
ͬ͘͟Γྨ
ऩू Amazon Kinesis Amazon Kinesi s Video Streams Amazon Kinesi
s Data Streams Amazon Kinesi s Data Firehose Amazon Manage d Streaming for Apache Kafka AWS Data Pipeline AWS Data Exchange
ੵ Amazon Redshift Amazon LakeFarmation Amazon S3
Ճ Amazon EMR AWS Glue AWS Glue Elastic Views
AWS Glue DataBrew Amazon Kinesi s Data Analytics
ੳ Amazon EMR AWS Athena Amazon Kinesi s Data Analytics
Amazon Redshift Amazon QuickSight Amazon OpenSearch Service
ՄࢹԽ Amazon ElasticSearch Service Amazon QuickSight Amazon OpenSearch Service ৭ʑ͋ͬͯ
ؾ࣋ͪɺগ͠ํੑݟ͖͑ͯͨ ؾ͕͢Δ
ͦΕͧΕΛͬ͘͟Γ
ऩू
ϦΞϧλΠϜετϦʔϛϯά Amazon Kinesis Amazon Kinesi s Video Streams Amazon Kinesi
s Data Streams Amazon Kinesi s Data Firehose Amazon Manage d Streaming for Apache Kafka AWS Data Pipeline AWS Data Exchange
Amazon Kinesis Amazon Kinesi s Video Streams Amazon Kinesi s
Data Streams Amazon Kinesi s Data Firehose Amazon Manage d Streaming for Apache Kafka KinesisαʔϏεͷ૯শ ετϦʔϛϯάಈըͷΩϟϓνϟɺ ॲཧɺอଘ ετϦʔϜσʔλͷΩϟϓνϟɺ ॲཧɺอଘ AWS σʔλετΞʹ ετϦʔϜσʔλΛϩʔυ ϚωʔδυܕApache Kafk a ετϦʔϜσʔλͷૹड৴
ͦͷଞ Amazon Kinesis Amazon Kinesi s Video Streams Amazon Kinesi
s Data Streams Amazon Kinesi s Data Firehose Amazon Manage d Streaming for Apache Kafka AWS Data Pipeline AWS Data Exchange
AWS Data Pipeline AWS Data Exchange αʔυύʔςΟσʔλͷ αϒεΫϦϓγϣϯ Reuters͕ఏڙ͢ΔهࣄσʔλͳͲ ఆظ࣮ߦʹΑΔσʔλҠಈɺม
ੵ
Amazon Redshift Amazon LakeFarmation Amazon S3 σʔλΣΞϋε γεςϜ͔Βେͳ”ߏԽσʔλ ” ΛूΊཧ͢Δݿ
σʔλϨΠΫΛߏங ະՃͰ༻్ఆΊΒΕ͍ͯͳ͍ σʔλΛอ͢Δ ΦϒδΣΫτετϨʔδ ”ߏԽσʔλ”ɺ“ඇߏԽσʔλ ” ͳͲΛอ͢ΔετϨʔδ
Ճɾੳ
Amazon EMR AWS Glue AWS Glue Elastic View s
(ϓϨϏϡʔ) AWS Glue DataBrew ϏοάσʔλϑϨʔϜϫʔΫ ؔ࿈OSSΛΈ߹Θͤͯେྔσʔλͷ ETLετϦʔϛϯάॲཧੳΛ࣮ߦ αʔόϨεETL(நग़/ม/ϩʔυ) ϊʔίʔυͰσʔλͷ ΫϦʔϯΞοϓͱਖ਼نԽ ϚςϦΞϥΠζυϏϡʔߏங ෳσʔλετΞʹΞΫηεͯ͠ σʔλΛ݁߹&ίϐʔ
AWS Athena Amazon Kinesi s Data Analytics ΞυϗοΫΫΤϦΛS3ʹର࣮ͯ͠ߦ ετϦʔϛϯάσʔλΛมɺੳ Amazon
Redshift σʔλΣΞϋε ෳࡶͳSQLΫΤϦΛ࣮ߦ
ՄࢹԽ
Amazon QuickSight Amazon OpenSearch Service&Kibana ϦΞϧλΠϜσʔλݕࡧ/ՄࢹԽ αʔόϨεBIπʔϧ/ՄࢹԽ
ͲΜͳ࣌ʹ͏ ओཁͦ͏ͳͷ
Amazon Kinesis Video Streams ɾಈըσʔλΛੜ͢ΔσόΠε͍҃ΞϓϦέʔγϣϯ͕͋Δ ɾHLSͰϥΠϒಈըըϝσΟΞΛϒϥβεϚϗʹετϦʔϛϯά͍ͨ͠ ɾϦΞϧλΠϜͷํϝσΟΞετϦʔϛϯάwebϒϥβετϦʔϛϯά͕͍ͨ͠ ɾಈըσʔλΛRekognitionVideo(ಈըೝࣝ)SageMaker(ML)ʹ͍͍ͨ
ɾαʔόσόΠε͕ੜ͢ΔϩάΠϕϯτσʔλΛϦΞϧλΠϜͰߴऩू͍ͨ͠ ɾ1ඵҎԼͷ͞ͰσʔλΛऩू͍ͨ͠ ɾετϦʔϛϯάσʔλΛLambdaͰॲཧ͍ͨ͠ ɾετϦʔϛϯάσʔλΛEC2ʹసૹ͍ͨ͠ ɾετϦʔϛϯάσʔλΛKinesis Data Analyticsʹసૹͯ͠ϦΞϧλΠϜੳ͍ͨ͠ Amazon Kinesis Data
Streams
ɾετϦʔϜσʔλΛS3RedshiftɺOpenSearchService৴͍ͨ͠ ɾ΄΅ϦΞϧλΠϜ(60ඵҎ)ͷ͞ͰσʔλΛ্هσʔλετΞ৴͍ͨ͠ ɾσʔλΛDatadogɺNewRelicɺMongoDBͳͲͷαʔϏεϓϩόΠμ৴͍ͨ͠ ɾσʔλΛσʔλετΞʹ৴͢ΔલʹApachParquetApacheORCʹม͍ͨ͠ ɾΞϓϦͷ։ൃΠϯϑϥͷཧΛͤͣʹσʔλετΞ৴͍ͨ͠ Amazon Kinesis Data Firehose
ɾετϦʔϛϯάσʔλʹରͯ͠ϦΞϧλΠϜʹඪ४SQLͰΫΤϦ͍ͨ͠ ɾ1ඵະຬͷ͞ͰετϦʔϛϯάσʔλΛϦΞϧλΠϜͰੳ͍ͨ͠ ɾApache FlinkΛ༷ͬͯʑͳAWSαʔϏεͱ౷߹ͯ͠ετϦʔϛϯά ETL͍ͨ͠ ɾSQLɺJavaɺScalaɺPythonͰੳΞϓϦέʔγϣϯΛߏஙͯ͠ੳ͍ͨ͠ Amazon Kinesis Data Analytics
ɾϊϯϦΞϧλΠϜ ɾAWSͷετϨʔδίϯϐϡʔςΟϯάɺΦϯϓϨϛεͷσʔλΛఆظతʹҠಈ͍ͨ͠ ɾσʔλҠಈͷࡍʹ؆୯ͳมͳͲͷॲཧΛߦ͍͍ͨ ɾRDS→DynamoDBͳͲͷσʔλҠಈ͕͍ͨ͠ͳͲ AWS Data Pipeline
ɾߏԽσʔλɺߏԽσʔλΛੳ͍ͨ͠ ɾେن(ϖλόΠτ)σʔλʹରͯ͠ෳࡶͳSQLΫΤϦΛ࣮ߦ͍ͨ͠ ɾܧଓతͳॻ͖ࠐΈߋ৽ͳ͘ɺେنσʔλΛҰׅͰੳ͕͍ͨ͠ ɾRedshift SpectrumΛ༻͍ͯS3ͷσʔλʹରͯ͠SQLΫΤϦΛ࣮ߦ͍ͨ͠ ɾΫΤϦ݁ՌΛS3ʹอଘͯ͠ଞAWSαʔϏεͳͲͰར༻͍ͨ͠ Amazon Redshift
ɾσʔλS3ʹ͋ΓɺγϯϓϧͳΞυϗοΫΫΤϦΛ࣮ߦ͍ͨ͠ ɾcsvɺjsonɼorcɺParquetܗࣜͳͲͷϑΝΠϧʹΫΤϦ͍ͨ͠ ɾαʔόϨεʹΫΤϦΛ࣮ߦ͍ͨ͠ ɾETLෆཁ ɾΫΤϦ݁ՌΛcsvʹग़ྗ͍ͨ͠ AWS Athena
ɾσʔλϨΠΫΛ؆୯ʹߏங͍ͨ͠ ɾࠓޙͷσʔλੳʹ͚ͯنʹؔΘΒͣະՃͷσʔλΛҰݩอ͍ͨ͠ ɾσʔλՃޙɺະՃσʔλอ͍࣋ͨ͠ ɾ৫ͷ༷ʑͳ෦ॺ͕֤ʑσʔλΛͬͯੳΛ͍ͨ͠ Amazon LakeFarmation
ɾOSSΛॊೈʹΧελϚΠζͯ͠σʔλॲཧΛΓ͍ͨ ɾେنσʔληοτͷETL(நग़/ม/ಡΈࠐΈ)Λ͍ͨ͠ ɾApache Spark MLlibɺTensorFlowɺApache MXNetͰML͍ͨ͠ ɾApache SparkApache HiveͰS3ͷΫϦοΫετϦʔϜσʔλΛੳ͍ͨ͠ ɾApache
FlinkͱApache Spark StreamingͰϦΞϧλΠϜετϦʔϛϯά͍ͨ͠ Amazon EMR
ɾαʔόʔϨεͰதنͷETL(நग़/ม/ಡΈࠐΈ)͕͍ͨ͠ ɾRedshiftɺS3ɺRDSɺDynamoDBͳͲͷσʔλΛETL͍ͨ͠ ɾσʔλιʔεΛఆظతʹΫϩʔϧͯ͠DataCatalogΛߋ৽ࣗ͠ಈతʹม͍ͨ͠ AWS Glue
ɾOpenSearchΫϥελΛ؆୯ʹߏஙͯ͠ΞϓϦͷϩάσʔλΛੳ͍ͨ͠ ɾΞϓϦΣϒαΠτɺσʔλϨΠΫΧλϩάͷݕࡧͰ͖ΔΑ͏ʹ͍ͨ͠ ɾΠϯϑϥͷϩάϝτϦοΫΛऩूͯ͠ϦΞϧλΠϜʹՄࢹԽ͍ͨ͠ ɾετϦʔϜσʔλΛϦΞϧλΠϜʹՄࢹԽ͍ͨ͠ Amazon OpenSearch Service&Kibana
ɾαʔόϨεͳBIπʔϧ͕͍͍ͨ ɾ༷ʑͳσʔλιʔε͔ΒσʔλΛՄࢹԽ͍ͨ͠ ɹ※S3ɺRDSɺAthenaɺRedshiftɺOpenSearchɺcsvjsonͳͲ ɾϦΞϧλΠϜͰͳ͘ఆظతͳάϥϑσʔλͳͲͷϨϙʔτ͕ཉ͍͠ ɾ༷ʑͳάϥϑΛ༻͍ͯੳ͍ͨ͠ Amazon QuickSight
2VJDL4JHIUՄࢹԽΠϝʔδ IUUQTBXTBNB[PODPNKQRVJDLTJHIUHBMMFSZ
None
None
બఆʹ͓͚ΔߟྀϙΠϯτ
·ͱΊ
·ͱΊ ऩू/ੳ/ՄࢹԽͷཻʹӨڹ͢ΔͷͰɺ Կͷҝͷੳ͔Λ໌֬ʹ͠Α͏
͋Γ͕ͱ͏͍͟͝·ͨ͠