Upgrade to Pro — share decks privately, control downloads, hide ads and more …

20231102 エヌビディアの生成 AI 向けソリューションNemo Frameworkのご紹介

Murakami Mana
November 28, 2023

20231102 エヌビディアの生成 AI 向けソリューションNemo Frameworkのご紹介

近年、生成AIの技術は目覚ましい進化をとげており、その産業応用にも注目が集まっています。
エヌビディアの生成AI向けのフレームワークであるNemo FrameworkはLLMの学習および学習済モデルを用いた推論をGPUを用いて高速に行う事が可能です。
本セッションでは、エヌビディアの生成AIへの取り組みおよび、Nemo Frameworkについて概説する予定です。

Murakami Mana

November 28, 2023
Tweet

Other Decks in Technology

Transcript

  1. AI
    Nemo Framework
    Mana Murakami, Senior Solution Architect, NVIDIA | Nov 2nd 2023

    View full-size slide

  2. Agenda
    • AI
    • NVIDIA Nemo Framework

    View full-size slide

  3. AI
    Wikipedia
    NLLB-200 CODEX
    MegaMolBART
    e-Diffi
    GPT-3
    How has NVIDIA contributed to
    acceleration of AI?
    NVIDIA has been a pioneer in the field of
    AI since the very beginning. Our GPU
    platform has enabled the rapid
    development of AI – from the training of
    neural networks, to inference in the data
    center, on-device AI in the car and in the
    cloud, and the deployment of AI to
    tackle challenging problems like
    conversational AI and translation.
    NVIDIA's GPU-accelerated computing
    platform is the engine of AI – it is the
    most important computing platform of
    our time.
    **Generated using NVIDIA NeMo service
    530B

    View full-size slide

  4. Transformer
    LLM
    0
    2000
    4000
    6000
    8000
    10000
    12000
    14000
    2017 2018 2019 2020 2021 2022
    Transformer and LLM Research Papers Per Year
    Dall-E 2
    ChatGPT
    NLLB-200
    TRANSFORMER
    BERT
    GPT-3
    CODEX
    MegaMolBART
    M Parameters
    year

    View full-size slide

  5. Pre-Training

    Fine-Tuning ( SFT や RLHF )


    Inference

    LLM
    STEP.3
    STEP.2
    STEP.1

    View full-size slide

  6. Pre-Training


    Fine-Tuning ( SFT や RLHF )



    • checkpoint Fine-Tuning
    Inference


    LLM
    STEP.3
    STEP.2
    STEP.1

    View full-size slide

  7. カスタムLLM開発の為のNVIDIA NeMo
    P-tuning, SFT, Adapters,
    RLHF, AliBi
    AI
    NeMo Framework
    NVIDIA AI Enterprise

    NVIDIA DGX Cloud

    View full-size slide

  8. NeMo Framework
    AI
    Training Inference
    ✓ 3D Parallelism : Data, Tensor & Pipeline,
    Sequence Parallelisms, Selective Activation Recomputation
    ✓ LLM : Adapters,
    RLHF, AliBi, SFT

    ✓ : SLURM, Nephele, Kubernetes –
    K8s ( )



    ✓ LLMs: BERT >100B, T5-MoE, T5, GPT-3, Inform
    ✓ Multi-modal: StableDiffusion, ViT, ViT-CLIP, Instruct-
    Pix2Pix, Imagen
    NVIDIA DGX SuperPODs
    NVIDIA DGX Cloud
    NVIDIA DGX Systems
    https://developer.nvidia.com/nemo-framework
    ※1
    ※1
    ※1 inform, Multi-modal

    View full-size slide

  9. Nemo Toolkit
    NeMo Framework
    AI
    Nemo Training
    コンテナ
    Nemo Inference
    コンテナ
    TensorRT-LLM
    NGC Pytorch
    NVIDIA GPU
    PyTorch
    3D
    AI
    LLM GPU
    Triton Inference
    Server
    PyTorch
    Lightning
    Megatron Core
    Nemo Megatron
    Launcher

    View full-size slide

  10. Pre-Training

    Fine-Tuning ( SFT や RLHF )


    Inference

    LLM
    STEP.3
    STEP.2
    STEP.1

    View full-size slide

  11. Tensor & Pipeline Parallelism Sequence Parallelism
    Selective Activation
    Recomputation
    AI
    GPU 0
    Time
    Feature
    Traditionally, LLMs (>175GB)
    every activation is
    recomputed
    GPU 1
    GPU 2
    . .
    .
    . .
    .
    . .
    .
    . .
    .
    Saved
    Recomputed
    Batch

    View full-size slide

  12. 3D (NVIDIA Megatron)
    Efficient Large-Scale Language Model Training on GPU Clusters, Deepak Narayanan et al., 2021

    View full-size slide

  13. 3D (NVIDIA Megatron)
    Efficient Large-Scale Language Model Training on GPU Clusters, Deepak Narayanan et al., 2021
    DGX A100
    NVLINK(gen3): 300GB/s
    IB(HDR): 25GB/s

    View full-size slide

  14. FSDP (Fully Sharded Data Parallel), ZeRO-3
    PyTorch DeepSpeed
    FSDP Data Parallelism Model Parallelism
    • 3D Parallelismとの違いは?
    N個のGPUに分割する場合、入力データがData Parallelism N
    N個に分割される
    https://www.deepspeed.ai/2021/03/07/zero3-offload.html
    Data parallel
    ZeRO-3

    View full-size slide

  15. 3D Parallelism vs. FSDP
    • FSDP


    • FSDP




    • Activations

    View full-size slide

  16. 3D Parallelism vs. FSDP
    • FSDP


    • FSDP




    • Activations

    Parallelism

    View full-size slide

  17. Nemo Frameworkの学習性能 (300億トークンの学習)
    Time to train 300B tokens in DAYS (A100) – BF16
    3072 GPUs
    (384 DGX A100)
    1600 GPUs
    (200 DGX A100)
    800 GPUs
    (100 DGX A100)
    480 GPUs
    (60 DGX A100)
    160 GPUs
    (20 DGX A100)
    64 GPUs
    (8 DGX A100)
    GPT-3: 2B 0.2 0.3 0.6 1.1 3.2 8.0
    GPT-3: 5B 0.4 0.8 1.6 2.7 8.0 20.0
    GPT-3: 20B 1.7 3.2 6.4 10.7 32.0 79.9
    GPT-3: 43B 3.6 6.9 13.7 22.9 68.7 171.7
    Pre-Training using Nemo Framework
    𝑠𝑒𝑐𝑜𝑛𝑑𝑠 ≈
    8𝑃𝑇
    𝑛𝑋
    P = Model Parameters
    T = Tokens
    n = GPU Count
    X = tFLOPS per GPU A100 theoretical peak ~312, measured avg 163

    View full-size slide

  18. Pre-Training

    Fine-Tuning ( SFT や RLHF )


    Inference

    LLM
    STEP.3
    STEP.2
    STEP.1

    View full-size slide

  19. 生成AI向け学習済モデルの公開
    NVIDIA Nemo Framework
    • Hugging Face NGC

    from nemo.collections import nlp as nemo_nlp
    from nemo.utils.exp_manager import exp_manager
    import pytorch_lightning as pl
    from omegaconf import OmegaConf
    #update config setting
    config = OmegaConf.load("text_classification_config.yaml“)
    config.model.tokenizer.vocab_file ="vocab.txt“
    config.model.dataset.num_classes=2
    config.model.train_ds.file_path ="train_nemo_format.tsv'"
    config.model.validation_ds.file_path ="dev_nemo_format.tsv“
    config.model.language_model.pretrained_model_name = "cl-
    tohoku/bert-base-japanese"
    trainer = pl.Trainer(**config.trainer)
    model = nemo_nlp.models.TextClassificationModel(cfg=config.model,
    trainer=trainer)
    tohoku/bert-base
    nemo model

    View full-size slide

  20. Community
    NVIDIA Nemo Framework
    • Llama2 HF StarCoder
    #!/bin/sh
    git-lfs clone https://huggingface.co/meta-llama/Llama-2-7b-hf
    python3 /opt/NeMo/scripts/nlp_language_modeling/convert_hf_llama_to_nemo.py \
    --in-file=./Llama2-7b-hf/ --out-file=llama2-7b.nemo
    Llama2-7b-hf
    [ ] convert-llama2-from-huggingface-format-to-nemo-format

    View full-size slide

  21. Nemo のモデル カスタマイズツール群
    PROMPT ENGINEERING PROMPT LEARNING PARAMETER EFFICIENT FINE-TUNING INSTRUCTION TUNING
    Data, compute
    & investment
    Accuracy for specific use-cases










    • • •





    • Few-shot learning
    • Chain-of-thought
    reasoning System promptin
    g
    • Prompt tuning
    • P-tuning
    • Adapters
    • LoRA
    • IA3
    • SFT
    • RLHF

    View full-size slide

  22. Nemo Framework
    • INSTRUCTION TUNING
    • [Playbook] NeMo Framework Supervised fine-tuning (SFT) with Llama2
    • [Documentation] Reinforcement Learning from Human Feedback
    • [Documentation] Instruction Following Taught by Supervised Fine-Tuning (SFT)
    • [Documentation] Model Fine-Tuning
    • [Jupyter Notebook] SFT example for Text Classification
    • PEFT
    • [Playbook] NeMo Framework PEFT with Llama2
    • [Documentation] Generalized PEFT Framework
    • PEFT Training and Inference for GPT-style Models
    • PEFT Training and Inference for mT5/T5-style Models
    • [Jupyter Notebook] Optimize GPT model for Extractive Q&A using LoRA
    • Prompt Learning
    • [Documentation] Model Prompt Learning

    View full-size slide

  23. Pre-Training

    Fine-Tuning ( SFT や RLHF )


    Inference

    LLM
    STEP.3
    STEP.2
    STEP.1

    View full-size slide

  24. TensorRT-LLM: LLM
    https://github.com/NVIDIA/TensorRT-LLM
    TensorRT-LLM LLM NVIDIA GPU
    TensorRT-LLM
    # define a new activation
    def silu(input: Tensor) → Tensor:
    return input * sigmoid(input)
    #implement models like in DL FWs
    class BertModel(Module)
    def init (…)
    self.layers = ModuleList([…])
    def forward (…)
    hidden = self.embedding(…)
    for layer in self.layers:
    hidden_states = layer(hidden)
    return hidden
    Numbers are preliminary based on internal evaluation
    Triton LLM
    batching
    LLM
    GPU- Node
    NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

    View full-size slide

  25. NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
    TensorRT-LLM
    TensorRT FasterTransformer
    • TensorRT-LLM FasterTransformer
    LLM
    • pytorch API
    • API TensorRT Engine
    • FasterTransformer
    • TensorRT
    • TensorRT python API
    • LLM TensorRT
    • FasterTranformer TensorRT OpenAI Triton
    CUTLASS

    View full-size slide

  26. TensorRT-LLM

    • Custom MHAs Inflight-batching paged attention
    quantized KV cache
    • TCO

    • 5
    • energy / inference
    H100 FP8 w/ IFB in TensorRT-LLM vs A100 FP16 PyTorch

    View full-size slide

  27. その他の機能
    Nemo Framework


    • [Documentation] NeMo Data Curator


    • Hyperparameter
    • [GTC Session] s41904 How to Avoid the Staggering Cost of Training State-of-the-art Large Language
    Models

    • NeMo-Guardrails

    View full-size slide

  28. まとめ
    Nemo Framework
    • Pre-Training Inference LLM
    • NVIDIA Megatron 3D
    • HuggingFace checkpoint NVIDIA
    checkpoint
    • INSTRUCTION TUNING PEFT PROMPT LEARNING
    • TensorRT-LLM + Triton Inference Server

    View full-size slide

  29. Language Models Text-to-Image Models Image-to-Image Models
    GPT
    T5, mT5, T5-MoE
    BERT
    Stable Diffusion v1.5/v2.0
    Imagen
    Vision Transformers
    CLIP
    Dreambooth
    InstructPix2Pix
    現在のモデルサポート状況とコンテナの入手方法
    Nemo Framework
    Prompt: A 'sks' dog mecha robot.
    Instruction: Make it on a beach
    Download Now - Language Apply Now – Multimodal (Coming Soon!)
    Now
    Available!

    View full-size slide

  30. Appendix.
    • NVIDIA Generative AI Solutions
    • NVIDIA NeMo Framework
    • NeMo Guardrails TechBlog
    • What are Large Language Models?
    • What Are Large Language Models Used For?
    • What are Foundation Models?
    • How To Create A Custom Language Model?
    • Adapting P-Tuning to Solve Non-English Downstream Tasks
    • NVIDIA AI Platform Delivers Big Gains for Large Language Models
    • The King’s Swedish: AI Rewrites the Book in Scandinavia
    • eBook Asset
    • No Hang Ups With Hangul: KT Trains Smart Speakers, Customer Call Centers With NVIDIA AI
    GTC
    • How to Build Generative AI for Enterprise Use-cases
    • Leveraging Large Language Models for Generating Content
    • Power Of Large Language Models: The Current State and Future Potential
    • Generative AI Demystified
    • Efficient At-Scale Training and Deployment of Large Language Models – GTC Session
    • Hyperparameter Tool GTC Session

    View full-size slide