Upgrade to Pro — share decks privately, control downloads, hide ads and more …

20231102 エヌビディアの生成 AI 向けソリューションNemo Frameworkのご紹介

Murakami Mana
November 28, 2023

20231102 エヌビディアの生成 AI 向けソリューションNemo Frameworkのご紹介

近年、生成AIの技術は目覚ましい進化をとげており、その産業応用にも注目が集まっています。
エヌビディアの生成AI向けのフレームワークであるNemo FrameworkはLLMの学習および学習済モデルを用いた推論をGPUを用いて高速に行う事が可能です。
本セッションでは、エヌビディアの生成AIへの取り組みおよび、Nemo Frameworkについて概説する予定です。

Murakami Mana

November 28, 2023
Tweet

Other Decks in Technology

Transcript

  1. AI Wikipedia NLLB-200 CODEX MegaMolBART e-Diffi GPT-3 How has NVIDIA

    contributed to acceleration of AI? NVIDIA has been a pioneer in the field of AI since the very beginning. Our GPU platform has enabled the rapid development of AI – from the training of neural networks, to inference in the data center, on-device AI in the car and in the cloud, and the deployment of AI to tackle challenging problems like conversational AI and translation. NVIDIA's GPU-accelerated computing platform is the engine of AI – it is the most important computing platform of our time. **Generated using NVIDIA NeMo service 530B
  2. Transformer LLM 0 2000 4000 6000 8000 10000 12000 14000

    2017 2018 2019 2020 2021 2022 Transformer and LLM Research Papers Per Year Dall-E 2 ChatGPT NLLB-200 TRANSFORMER BERT GPT-3 CODEX MegaMolBART M Parameters year
  3. Pre-Training • Fine-Tuning ( SFT や RLHF ) • •

    Inference • LLM STEP.3 STEP.2 STEP.1
  4. Pre-Training • • Fine-Tuning ( SFT や RLHF ) •

    • • • checkpoint Fine-Tuning Inference • • LLM STEP.3 STEP.2 STEP.1
  5. NeMo Framework AI Training Inference ✓ 3D Parallelism : Data,

    Tensor & Pipeline, Sequence Parallelisms, Selective Activation Recomputation ✓ LLM : Adapters, RLHF, AliBi, SFT ✓ ✓ : SLURM, Nephele, Kubernetes – K8s ( ) ✓ ✓ ✓ ✓ LLMs: BERT >100B, T5-MoE, T5, GPT-3, Inform ✓ Multi-modal: StableDiffusion, ViT, ViT-CLIP, Instruct- Pix2Pix, Imagen NVIDIA DGX SuperPODs NVIDIA DGX Cloud NVIDIA DGX Systems https://developer.nvidia.com/nemo-framework ※1 ※1 ※1 inform, Multi-modal
  6. Nemo Toolkit NeMo Framework AI Nemo Training コンテナ Nemo Inference

    コンテナ TensorRT-LLM NGC Pytorch NVIDIA GPU PyTorch 3D AI LLM GPU Triton Inference Server PyTorch Lightning Megatron Core Nemo Megatron Launcher
  7. Pre-Training • Fine-Tuning ( SFT や RLHF ) • •

    Inference • LLM STEP.3 STEP.2 STEP.1
  8. Tensor & Pipeline Parallelism Sequence Parallelism Selective Activation Recomputation AI

    GPU 0 Time Feature Traditionally, LLMs (>175GB) every activation is recomputed GPU 1 GPU 2 . . . . . . . . . . . . Saved Recomputed Batch
  9. 3D (NVIDIA Megatron) Efficient Large-Scale Language Model Training on GPU

    Clusters, Deepak Narayanan et al., 2021 DGX A100 NVLINK(gen3): 300GB/s IB(HDR): 25GB/s
  10. FSDP (Fully Sharded Data Parallel), ZeRO-3 PyTorch DeepSpeed FSDP Data

    Parallelism Model Parallelism • 3D Parallelismとの違いは? N個のGPUに分割する場合、入力データがData Parallelism N N個に分割される https://www.deepspeed.ai/2021/03/07/zero3-offload.html Data parallel ZeRO-3
  11. 3D Parallelism vs. FSDP • FSDP • • • FSDP

    • • • • • Activations •
  12. 3D Parallelism vs. FSDP • FSDP • • • FSDP

    • • • • • Activations • Parallelism
  13. Nemo Frameworkの学習性能 (300億トークンの学習) Time to train 300B tokens in DAYS

    (A100) – BF16 3072 GPUs (384 DGX A100) 1600 GPUs (200 DGX A100) 800 GPUs (100 DGX A100) 480 GPUs (60 DGX A100) 160 GPUs (20 DGX A100) 64 GPUs (8 DGX A100) GPT-3: 2B 0.2 0.3 0.6 1.1 3.2 8.0 GPT-3: 5B 0.4 0.8 1.6 2.7 8.0 20.0 GPT-3: 20B 1.7 3.2 6.4 10.7 32.0 79.9 GPT-3: 43B 3.6 6.9 13.7 22.9 68.7 171.7 Pre-Training using Nemo Framework 𝑠𝑒𝑐𝑜𝑛𝑑𝑠 ≈ 8𝑃𝑇 𝑛𝑋 P = Model Parameters T = Tokens n = GPU Count X = tFLOPS per GPU A100 theoretical peak ~312, measured avg 163
  14. Pre-Training • Fine-Tuning ( SFT や RLHF ) • •

    Inference • LLM STEP.3 STEP.2 STEP.1
  15. 生成AI向け学習済モデルの公開 NVIDIA Nemo Framework • Hugging Face NGC • from

    nemo.collections import nlp as nemo_nlp from nemo.utils.exp_manager import exp_manager import pytorch_lightning as pl from omegaconf import OmegaConf #update config setting config = OmegaConf.load("text_classification_config.yaml“) config.model.tokenizer.vocab_file ="vocab.txt“ config.model.dataset.num_classes=2 config.model.train_ds.file_path ="train_nemo_format.tsv'" config.model.validation_ds.file_path ="dev_nemo_format.tsv“ config.model.language_model.pretrained_model_name = "cl- tohoku/bert-base-japanese" trainer = pl.Trainer(**config.trainer) model = nemo_nlp.models.TextClassificationModel(cfg=config.model, trainer=trainer) tohoku/bert-base nemo model
  16. Community NVIDIA Nemo Framework • Llama2 HF StarCoder #!/bin/sh git-lfs

    clone https://huggingface.co/meta-llama/Llama-2-7b-hf python3 /opt/NeMo/scripts/nlp_language_modeling/convert_hf_llama_to_nemo.py \ --in-file=./Llama2-7b-hf/ --out-file=llama2-7b.nemo Llama2-7b-hf [ ] convert-llama2-from-huggingface-format-to-nemo-format
  17. Nemo のモデル カスタマイズツール群 PROMPT ENGINEERING PROMPT LEARNING PARAMETER EFFICIENT FINE-TUNING

    INSTRUCTION TUNING Data, compute & investment Accuracy for specific use-cases • • • • • • • • • • • • • • • • • • • Few-shot learning • Chain-of-thought reasoning System promptin g • Prompt tuning • P-tuning • Adapters • LoRA • IA3 • SFT • RLHF
  18. Nemo Framework • INSTRUCTION TUNING • [Playbook] NeMo Framework Supervised

    fine-tuning (SFT) with Llama2 • [Documentation] Reinforcement Learning from Human Feedback • [Documentation] Instruction Following Taught by Supervised Fine-Tuning (SFT) • [Documentation] Model Fine-Tuning • [Jupyter Notebook] SFT example for Text Classification • PEFT • [Playbook] NeMo Framework PEFT with Llama2 • [Documentation] Generalized PEFT Framework • PEFT Training and Inference for GPT-style Models • PEFT Training and Inference for mT5/T5-style Models • [Jupyter Notebook] Optimize GPT model for Extractive Q&A using LoRA • Prompt Learning • [Documentation] Model Prompt Learning
  19. Pre-Training • Fine-Tuning ( SFT や RLHF ) • •

    Inference • LLM STEP.3 STEP.2 STEP.1
  20. TensorRT-LLM: LLM https://github.com/NVIDIA/TensorRT-LLM TensorRT-LLM LLM NVIDIA GPU TensorRT-LLM # define

    a new activation def silu(input: Tensor) → Tensor: return input * sigmoid(input) #implement models like in DL FWs class BertModel(Module) def init (…) self.layers = ModuleList([…]) def forward (…) hidden = self.embedding(…) for layer in self.layers: hidden_states = layer(hidden) return hidden Numbers are preliminary based on internal evaluation Triton LLM batching LLM GPU- Node NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
  21. NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. TensorRT-LLM TensorRT FasterTransformer • TensorRT-LLM

    FasterTransformer LLM • pytorch API • API TensorRT Engine • FasterTransformer • TensorRT • TensorRT python API • LLM TensorRT • FasterTranformer TensorRT OpenAI Triton CUTLASS •
  22. TensorRT-LLM • • Custom MHAs Inflight-batching paged attention quantized KV

    cache • TCO • • 5 • energy / inference H100 FP8 w/ IFB in TensorRT-LLM vs A100 FP16 PyTorch
  23. その他の機能 Nemo Framework • • • [Documentation] NeMo Data Curator

    • • • Hyperparameter • [GTC Session] s41904 How to Avoid the Staggering Cost of Training State-of-the-art Large Language Models • • NeMo-Guardrails •
  24. まとめ Nemo Framework • Pre-Training Inference LLM • NVIDIA Megatron

    3D • HuggingFace checkpoint NVIDIA checkpoint • INSTRUCTION TUNING PEFT PROMPT LEARNING • TensorRT-LLM + Triton Inference Server
  25. Language Models Text-to-Image Models Image-to-Image Models GPT T5, mT5, T5-MoE

    BERT Stable Diffusion v1.5/v2.0 Imagen Vision Transformers CLIP Dreambooth InstructPix2Pix 現在のモデルサポート状況とコンテナの入手方法 Nemo Framework Prompt: A 'sks' dog mecha robot. Instruction: Make it on a beach Download Now - Language Apply Now – Multimodal (Coming Soon!) Now Available!
  26. Appendix. • NVIDIA Generative AI Solutions • NVIDIA NeMo Framework

    • NeMo Guardrails TechBlog • What are Large Language Models? • What Are Large Language Models Used For? • What are Foundation Models? • How To Create A Custom Language Model? • Adapting P-Tuning to Solve Non-English Downstream Tasks • NVIDIA AI Platform Delivers Big Gains for Large Language Models • The King’s Swedish: AI Rewrites the Book in Scandinavia • eBook Asset • No Hang Ups With Hangul: KT Trains Smart Speakers, Customer Call Centers With NVIDIA AI GTC • How to Build Generative AI for Enterprise Use-cases • Leveraging Large Language Models for Generating Content • Power Of Large Language Models: The Current State and Future Potential • Generative AI Demystified • Efficient At-Scale Training and Deployment of Large Language Models – GTC Session • Hyperparameter Tool GTC Session