Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Generative AI FM Training and Deployment with N...

Generative AI FM Training and Deployment with NVIDIA NeMo and NIM on Amazon SageMaker

タイトル: Amazon SageMaker と NVIDIA で効率の良い生成 AI 開発

概要:
AWS 上で NVIDIA NeMo や NIM を使った効率の良いモデル学習・推論を行うために関連する Amazon SageMaker AI などのサービスを紹介します。
====
イベントタイトル: NVIDIA × AWS Presents: AI モデル開発最前線 ─ 「NVIDIA NeMo」「NVIDIA NIM」やAWSを活用してAI学習効率を最大化!

イベント概要:
本イベントは、最新の AI モデル開発環境と学習効率の最適化をテーマにしたイベントです。

AWS からは Amazon SageMaker AI などのサービスを活用した効率的な生成AI開発について、NVIDIAからは最新の AI モデルやアプリケーションの構築を加速する「NVIDIA NeMo」と「NVIDIA NIM」の機能や利点について解説が行われます。

実践的な事例として、Stockmark 社より、AWS 上の開発環境でも活用できる NVIDIA NeMo を用いたモデル開発と精度改善について、NeMo Aligner や Reranker などのツールの具体的な活用方法が紹介されます。

さらに、Turing からは自動運転AI開発における実践的な取り組みとして、GPU リソースの最適化やハイブリッド環境の設計、特にマルチモーダル基盤モデル開発のための GPU 計算環境構築について、オンプレとクラウドの比較を交えた紹介が予定されています。

セッション後には懇親会も設けられ、参加者間での情報交換の機会も提供されます。AI 開発者にとって、最新技術動向と実践的なノウハウを学べる貴重な機会となっています。

Avatar for Yoshitaka Haribara

Yoshitaka Haribara

May 27, 2025
Tweet

More Decks by Yoshitaka Haribara

Other Decks in Technology

Transcript

  1. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. Generative AI FM Training/Deployment with NVIDIA NeMo/NIM on Amazon SageMaker Yoshitaka Haribara, Ph.D. Sr. GenAI Startup Solutions Architect Amazon Web Services Japan G.K.
  2. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Amazon Confidential and Trademark. NVIDIA AI Summit Japan 2024 with Howard Wright (VP, Startups, NVIDIA, ex-AWS) 針原佳貴, Ph.D. (X: @_hariby) - アマゾン ウェブ サービス ジャパン合同会社 シニア 生成 AI スタートアップ ソリューションアーキテクト - 大阪大学 量子情報・量子生命研究センター (QIQB) 招へい准教授 略歴 - 2013年 大阪大学 理学部 数学科卒業 - バンド、ドラム、応用数学・特殊関数 (ベッセル関数) - 2018年 東京大学 大学院 情報理工学系研究科 博士課程修了 - 光イジングマシン、SIMD/MIMD/FPGA、量子化ニューラルネット - 2018年 AWS Japan 入社 - クラウド、機械学習・生成 AI、量子コンピュータ、スタートアップ 趣味 - バンド・ドラム (YouTube/Instagram: @dr.hariby) - 最近各種ストリーミングで楽曲配信始めました https://www.tunecore.co.jp/artists/hari-psycho-experience
  3. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Amazon Confidential and Trademark. Agenda • AWS GenAI Service Stack (EC2 NVIDIA GPU Instances, SageMaker AI) • NeMo on SageMaker HyperPod • NIM on SageMaker AI 3
  4. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. AWS ⽣成 AI スタック ⽣産性を向上させるアプリケーション ⽣成 AI アプリを構築するためのモデルとツール AI モデルの構築・学習のためのインフラストラクチャー Amazon Q Business 洞 察 と ⾃ 動 化 Amazon Q Developer ソ フ ト ウ ェ ア 開 発 ラ イ フ サ イ ク ル Amazon Bedrock AMAZON の モ デ ル | パ ー ト ナ ー の モ デ ル AWS Trainium AWS Inferentia GPUs ハ イ パ フ ォ ー マ ン ス コ ン ピ ュ ー テ ィ ン グ ( HP C ) Amazon SageMaker AI マ ネ ー ジ ド イ ン フ ラ ス ト ラ ク チ ャ ー
  5. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. AWS ⽣成 AI スタック AI モデルの構築・学習のためのインフラストラクチャー AWS Trainium AWS Inferentia GPUs ハ イ パ フ ォ ー マ ン ス コ ン ピ ュ ー テ ィ ン グ ( HP C ) Amazon SageMaker AI マ ネ ー ジ ド イ ン フ ラ ス ト ラ ク チ ャ ー
  6. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved 8 G P U 、 A W S M L ア ク セ ラ レ ー タ ー 、 お よ び F P G A ベ ー ス の E C 2 イ ン ス タ ン ス 幅広く深い加速コンピューティングのポートフォリオ PREVIEW AWS Trainium AWS Inferentia B200, H200, H100, A100, L4, L40S, A10G, T4 Gaudi accelerator Radeon GPU Xilinx accelerator Xilinx FPGA DL1 F1 VT1 AI/ML accelerators, ASICs, FPGA DL2q Qualcomm Cloud AI 100 G5 P4de G6 P4d P5 P5e NVIDIA GPUs G4 P6- B200 G6e Inf1 Inf2 Trn1 Trn2 AWS ML chips F2 Trn3 P5en GB200
  7. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved 9 NVIDIA GPU Instances for ML Training P4: NVIDIA A100 • Up to 156 teraflops FP64 compute and up to 640 GB HBM2 • Up to 400 Gbps networking (EFA) and 600 GB/s device- device interconnect P4 P5: NVIDIA H100 • Up to 536 teraflops FP64 compute and up to 640 GB HBM3 • Up to 3,200 Gbps networking (EFA) and 900 GB/s device- device interconnect P5 P5en: NVIDIA H200 • Up to 536 teraflops FP64 compute and up to 1128 GB HBM3e • Up to 3,200 Gbps networking (EFA) and 900 GB/s device- device interconnect P5en P6-B200: NVIDIA B200 • Powered by NVIDIA B200 GPUs • EC2 UltraCluster for accelerating generative AI training and inference at massive scale P6
  8. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved 10 NVIDIA GPU Instances for ML Inference G6e: NVIDIA L40S • Up to 8 NVIDIA L40S GPUs • Up to 384 GB vRAM @ 860 Gb/s G6: NVIDIA L4 • Up to 8 NVIDIA L4 GPUs • Up to 192 GB vRAM @ 300 Gb/s G5: NVIDIA A10G • Up to 8 NVIDIA A10G GPUs • Up to 192 GB vRAM @ 600 Gb/s G5 G6 G6e
  9. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved 11 Upcoming accelerated computing instances Trn3: AWS Trainium3 • Designed to deliver the highest performance, most energy efficient AI model training infrastructure in the cloud Trn3 GB200: • Featuring GB200 NVL72, with 72 Blackwell GPUs and 36 Grace CPUs interconnected by fifth- generation NVIDIA NVLink™ • EC2 UltraCluster connected with Amazon’s powerful networking (EFA) and supported by advanced virtualization (AWS Nitro System) P* DGX Cloud • AI platform co-engineered by AWS and NVIDIA • Powered by NVIDIA GB200 superchips • Access to the infrastructure and software needed to build and deploy advanced generative AI models on AWS DGX
  10. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved 12 Up to 20,000 H200/H100 GPUs (P5) or 100,000 Trainium Accelerators (Trn2) Nonblocking petabit-scale network infrastructure Redesigned for 16x larger scale and lower latency with third-gen EFA High-throughput, low-latency storage from Amazon FSx for Lustre T H E L A R G E S T S C A L E M L I N F R A S T R U C T U R E I N T H E C L O U D Second-generation EC2 UltraClusters Up to 20,000 GPUs Petabytes per second throughput, billions of IOPS 3,200 Gbps Elastic Fabric Adapter (EFA) Petabit-scale nonblocking network infrastructure Scalable low- latency storage Second-generation EC2 UltraClusters
  11. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved 13 13 SQL Analytics Amazon Redshift Data Processing Amazon EMR AWS Glue Amazon Athena Model Development Amazon SageMaker AI Gen AI App Development Amazon Bedrock Streaming Amazon MSK Amazon Kinesis Business Intelligence Amazon QuickSight Search Analytics Amazon OpenSearch Service C O M I N G S O O N C O M I N G S O O N C O M I N G S O O N Unified Studio Data & AI Governance Lakehouse Amazon SageMaker
  12. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved 14 Amazon SageMaker 14 Data and AI governance Governance built in to discover, share, and collaborate on data and AI securely Built on Amazon DataZone Amazon SageMaker Catalog Data Models Gen AI Compute Data & AI Governance
  13. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved 15 15 SageMaker Lakehouse Unify access to all your data Amazon S3 data lakes Amazon Redshift data warehouses Zero-ETL integrations Aurora RDS OpenSearch vector data ServiceNow Salesforce Zoho CRM Instagram Ads SAP Salesforce Pardot Facebook Ads Zendesk DynamoDB Streaming data – MSK, Kinesis Federated querying + 100s of AWS Glue connectors
  14. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. P U R P O S E - B U I L T I N F R A S T R U C T U R E F O R F M T R A I N I N G SageMaker AI でモデル学習を行う2つの選択肢 Amazon SageMaker HyperPod Fully managed training jobs 大規模で費用対効果の高い トレーニングのための フルマネージドな耐障害性 インフラストラクチャ インフラよりもモデル構築に集中 従量課金制オプションを備えた柔軟な オンデマンド GPU クラスターへのアクセスを提供 最大限のリソース制御のための 耐障害性と 自己オーケストレーション インフラストラクチャ クラスターオーケストレーション (Slurm または EKS) のカスタマイズと管理 チーム間でクラスター利用率を最大化するための ワークロードのスケジューリング
  15. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Single model deployment Single container Multi-container Multi LoRA adapter hosting Serverless GPUs CPUs Amazon SageMaker AI でのモデルデプロイ Invoke Response Real-time synchronous response Near real-time asynchronous response Invoke Response Offline batch inference Submit Complete SageMaker AI Multi-model deployment Model Container Infrastructure Amazon
  16. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. Model Training with NVIDIA NeMo on SageMaker HyperPod 24
  17. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Amazon Confidential and Trademark. NVIDIA NeMo framework on Amazon SageMaker HyperPod clusters This part of guidance demonstrates how to deploy SageMaker HyperPod clusters based on HPC (slurm) Administrators / DevOps Engineers Amazon FSx for Lustre S3 Bucket Controller Node HyperPod Compute Nodes SSH via SSM Data Scientists/ML Engineers VPC Peering Amazon SageMaker HyperPod cluster Elastic Fabric Adapter AWS Cloud Region Customer Account Service Account hpc hpc hpc HyperPod VPC hpc hpc hpc 1 2 3 4 5 6 7 8 1 3 4 5 6 2 7 Account team reserves compute capacity with On-Demand Capacity Reservations (ODCR) or Amazon SageMaker HyperPod Flexible Training Plans Admin/DevOps Engineers use the Sagemaker HyperPod Virtual Private Cloud VPC stack to deploy networking, storage and Identity and Access Management IAM resources Admin/DevOps Engineers push Lifecycle scripts to S3 bucket created in previous step Admin/DevOps Engineers use the Amazon SageMaker cli to create the SageMaker HyperPod cluster Admin/DevOps Engineers generate key-pairs to establish access to the Controller Node of the HyperPod cluster. Once the HyperPod cluster is created, admin can test SSH access to the Controller and Compute nodes and examine the cluster Admin/DevOps Engineers configures IAM to use Amazon Managed Prometheus to collect cluster metrics and Amazon Managed Grafana to set up the observability stack Admin/DevOps Engineers can make further changes to the cluster using the HyperPod CLI Account Team AWS IAM Identity Center Amazon Managed Service Grafana Amazon Managed Service for Prometheus 8 Aws cli Custom er VPC Endpoint
  18. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Amazon Confidential and Trademark. Demo Video – Setup NVIDIA NeMo on HyperPod easily 26
  19. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Amazon Confidential and Trademark. © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. Model Deplyment with NVIDIA NIM on SageMaker AI 27
  20. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Amazon Confidential and Trademark. NVIDIA NIM 28
  21. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Amazon Confidential and Trademark. 29 NVIDIA NIM public ECR gallery on AWS https://gallery.ecr.aws/nvidia/nim
  22. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Amazon Confidential and Trademark. Deploy with NIM on SageMaker (Mixtral 8x7B) sm_model_name = "nim-mixtral-8x7b-instruct” instance_type = "ml.p4d.24xlarge" container = { "Image": nim_image, "Environment": {"NGC_API_KEY": NGC_API_KEY} } create_model_response = sm.create_model( ModelName=sm_model_name, ExecutionRoleArn=role, PrimaryContainer=container ) create_endpoint_config_response = sm.create_endpoint_config( EndpointConfigName=sm_model_name, ProductionVariants=[ { "InstanceType": instance_type, "InitialVariantWeight": 1, "InitialInstanceCount": 1, "ModelName": sm_model_name, "VariantName": "AllTraffic", "ContainerStartupHealthCheckTimeoutInSeconds": 850 } ], ) create_endpoint_response = sm.create_endpoint( EndpointName=sm_model_name, EndpointConfigName=sm_model_name ) 30
  23. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Amazon Confidential and Trademark. Inference with NIM on SageMaker (Mixtral 8x7B) payload_model = "mistralai/mixtral-8x7b-instruct-v0.1” messages = [ {"role": "user", "content": "Hello! How are you?"}, {"role": "assistant", "content": "Hi! I am quite well, how can I help you today?"}, {"role": "user", "content": "Explain to me in detail what llm serving frameworks are"} ] payload = { "model": payload_model, "messages": messages, "max_tokens": 1024, "stream": True } response = client.invoke_endpoint_with_response_stream( EndpointName=sm_model_name, Body=json.dumps(payload), ContentType="application/json", Accept="application/jsonlines", ) 31
  24. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Amazon Confidential and Trademark. References • NeMo • On SageMaker HyperPod https://aws.amazon.com/blogs/machine- learning/running-nvidia-nemo-2-0-framework-on-amazon-sagemaker- hyperpod/ • NIM • NVIDIA AI Enterprise (AWS Marketplace) https://aws.amazon.com/marketplace/pp/prodview-ozgjkov6vq3l6 • Mixtral 8x7B https://github.com/aws-samples/mistral-on- aws/blob/main/notebooks/NIM-inference- samples/mixtral_8x7b_Nvidia_nim.ipynb 32
  25. © 2025, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Amazon Confidential and Trademark. Thank you! © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.