20231102 エヌビディアの生成 AI 向けソリューションNemo Frameworkのご紹介

AI Nemo Framework Mana Murakami, Senior Solution Architect, NVIDIA |
Nov 2nd 2023

Agenda • AI • NVIDIA Nemo Framework •

• AI

AI Wikipedia NLLB-200 CODEX MegaMolBART e-Diffi GPT-3 How has NVIDIA
contributed to acceleration of AI? NVIDIA has been a pioneer in the field of AI since the very beginning. Our GPU platform has enabled the rapid development of AI – from the training of neural networks, to inference in the data center, on-device AI in the car and in the cloud, and the deployment of AI to tackle challenging problems like conversational AI and translation. NVIDIA's GPU-accelerated computing platform is the engine of AI – it is the most important computing platform of our time. **Generated using NVIDIA NeMo service 530B

Transformer LLM 0 2000 4000 6000 8000 10000 12000 14000
2017 2018 2019 2020 2021 2022 Transformer and LLM Research Papers Per Year Dall-E 2 ChatGPT NLLB-200 TRANSFORMER BERT GPT-3 CODEX MegaMolBART M Parameters year

Pre-Training • Fine-Tuning ( SFT や RLHF ) • •
Inference • LLM STEP.3 STEP.2 STEP.1

Pre-Training • • Fine-Tuning ( SFT や RLHF ) •
• • • checkpoint Fine-Tuning Inference • • LLM STEP.3 STEP.2 STEP.1

NVIDIA Nemo

カスタムLLM開発の為のNVIDIA NeMo P-tuning, SFT, Adapters, RLHF, AliBi AI NeMo Framework
NVIDIA AI Enterprise … NVIDIA DGX Cloud

NeMo Framework AI Training Inference ✓ 3D Parallelism : Data,
Tensor & Pipeline, Sequence Parallelisms, Selective Activation Recomputation ✓ LLM : Adapters, RLHF, AliBi, SFT ✓ ✓ : SLURM, Nephele, Kubernetes – K8s ( ) ✓ ✓ ✓ ✓ LLMs: BERT >100B, T5-MoE, T5, GPT-3, Inform ✓ Multi-modal: StableDiffusion, ViT, ViT-CLIP, Instruct- Pix2Pix, Imagen NVIDIA DGX SuperPODs NVIDIA DGX Cloud NVIDIA DGX Systems https://developer.nvidia.com/nemo-framework ※1 ※1 ※1 inform, Multi-modal

Nemo Toolkit NeMo Framework AI Nemo Training コンテナ Nemo Inference
コンテナ TensorRT-LLM NGC Pytorch NVIDIA GPU PyTorch 3D AI LLM GPU Triton Inference Server PyTorch Lightning Megatron Core Nemo Megatron Launcher

Tensor & Pipeline Parallelism Sequence Parallelism Selective Activation Recomputation AI
GPU 0 Time Feature Traditionally, LLMs (>175GB) every activation is recomputed GPU 1 GPU 2 . . . . . . . . . . . . Saved Recomputed Batch

3D (NVIDIA Megatron) Efficient Large-Scale Language Model Training on GPU
Clusters, Deepak Narayanan et al., 2021

3D (NVIDIA Megatron) Efficient Large-Scale Language Model Training on GPU
Clusters, Deepak Narayanan et al., 2021 DGX A100 NVLINK(gen3): 300GB/s IB(HDR): 25GB/s

FSDP (Fully Sharded Data Parallel), ZeRO-3 PyTorch DeepSpeed FSDP Data
Parallelism Model Parallelism • 3D Parallelismとの違いは? N個のGPUに分割する場合、入力データがData Parallelism N N個に分割される https://www.deepspeed.ai/2021/03/07/zero3-offload.html Data parallel ZeRO-3

3D Parallelism vs. FSDP • FSDP • • • FSDP
• • • • • Activations •

3D Parallelism vs. FSDP • FSDP • • • FSDP
• • • • • Activations • Parallelism

Nemo Frameworkの学習性能 (300億トークンの学習) Time to train 300B tokens in DAYS
(A100) – BF16 3072 GPUs (384 DGX A100) 1600 GPUs (200 DGX A100) 800 GPUs (100 DGX A100) 480 GPUs (60 DGX A100) 160 GPUs (20 DGX A100) 64 GPUs (8 DGX A100) GPT-3: 2B 0.2 0.3 0.6 1.1 3.2 8.0 GPT-3: 5B 0.4 0.8 1.6 2.7 8.0 20.0 GPT-3: 20B 1.7 3.2 6.4 10.7 32.0 79.9 GPT-3: 43B 3.6 6.9 13.7 22.9 68.7 171.7 Pre-Training using Nemo Framework 𝑠𝑒𝑐𝑜𝑛𝑑𝑠 ≈ 8𝑃𝑇 𝑛𝑋 P = Model Parameters T = Tokens n = GPU Count X = tFLOPS per GPU A100 theoretical peak ~312, measured avg 163

生成AI向け学習済モデルの公開 NVIDIA Nemo Framework • Hugging Face NGC • from
nemo.collections import nlp as nemo_nlp from nemo.utils.exp_manager import exp_manager import pytorch_lightning as pl from omegaconf import OmegaConf #update config setting config = OmegaConf.load("text_classification_config.yaml“) config.model.tokenizer.vocab_file ="vocab.txt“ config.model.dataset.num_classes=2 config.model.train_ds.file_path ="train_nemo_format.tsv'" config.model.validation_ds.file_path ="dev_nemo_format.tsv“ config.model.language_model.pretrained_model_name = "cl- tohoku/bert-base-japanese" trainer = pl.Trainer(**config.trainer) model = nemo_nlp.models.TextClassificationModel(cfg=config.model, trainer=trainer) tohoku/bert-base nemo model

Community NVIDIA Nemo Framework • Llama2 HF StarCoder #!/bin/sh git-lfs
clone https://huggingface.co/meta-llama/Llama-2-7b-hf python3 /opt/NeMo/scripts/nlp_language_modeling/convert_hf_llama_to_nemo.py \ --in-file=./Llama2-7b-hf/ --out-file=llama2-7b.nemo Llama2-7b-hf [ ] convert-llama2-from-huggingface-format-to-nemo-format

Nemo のモデルカスタマイズツール群 PROMPT ENGINEERING PROMPT LEARNING PARAMETER EFFICIENT FINE-TUNING
INSTRUCTION TUNING Data, compute & investment Accuracy for specific use-cases • • • • • • • • • • • • • • • • • • • Few-shot learning • Chain-of-thought reasoning System promptin g • Prompt tuning • P-tuning • Adapters • LoRA • IA3 • SFT • RLHF

Nemo Framework • INSTRUCTION TUNING • [Playbook] NeMo Framework Supervised
fine-tuning (SFT) with Llama2 • [Documentation] Reinforcement Learning from Human Feedback • [Documentation] Instruction Following Taught by Supervised Fine-Tuning (SFT) • [Documentation] Model Fine-Tuning • [Jupyter Notebook] SFT example for Text Classification • PEFT • [Playbook] NeMo Framework PEFT with Llama2 • [Documentation] Generalized PEFT Framework • PEFT Training and Inference for GPT-style Models • PEFT Training and Inference for mT5/T5-style Models • [Jupyter Notebook] Optimize GPT model for Extractive Q&A using LoRA • Prompt Learning • [Documentation] Model Prompt Learning

TensorRT-LLM: LLM https://github.com/NVIDIA/TensorRT-LLM TensorRT-LLM LLM NVIDIA GPU TensorRT-LLM # define
a new activation def silu(input: Tensor) → Tensor: return input * sigmoid(input) #implement models like in DL FWs class BertModel(Module) def init (…) self.layers = ModuleList([…]) def forward (…) hidden = self.embedding(…) for layer in self.layers: hidden_states = layer(hidden) return hidden Numbers are preliminary based on internal evaluation Triton LLM batching LLM GPU- Node NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. TensorRT-LLM TensorRT FasterTransformer • TensorRT-LLM
FasterTransformer LLM • pytorch API • API TensorRT Engine • FasterTransformer • TensorRT • TensorRT python API • LLM TensorRT • FasterTranformer TensorRT OpenAI Triton CUTLASS •

TensorRT-LLM • • Custom MHAs Inflight-batching paged attention quantized KV
cache • TCO • • 5 • energy / inference H100 FP8 w/ IFB in TensorRT-LLM vs A100 FP16 PyTorch

その他の機能 Nemo Framework • • • [Documentation] NeMo Data Curator
• • • Hyperparameter • [GTC Session] s41904 How to Avoid the Staggering Cost of Training State-of-the-art Large Language Models • • NeMo-Guardrails •

まとめ Nemo Framework • Pre-Training Inference LLM • NVIDIA Megatron
3D • HuggingFace checkpoint NVIDIA checkpoint • INSTRUCTION TUNING PEFT PROMPT LEARNING • TensorRT-LLM + Triton Inference Server

Language Models Text-to-Image Models Image-to-Image Models GPT T5, mT5, T5-MoE
BERT Stable Diffusion v1.5/v2.0 Imagen Vision Transformers CLIP Dreambooth InstructPix2Pix 現在のモデルサポート状況とコンテナの入手方法 Nemo Framework Prompt: A 'sks' dog mecha robot. Instruction: Make it on a beach Download Now - Language Apply Now – Multimodal (Coming Soon!) Now Available!

Appendix. • NVIDIA Generative AI Solutions • NVIDIA NeMo Framework
• NeMo Guardrails TechBlog • What are Large Language Models? • What Are Large Language Models Used For? • What are Foundation Models? • How To Create A Custom Language Model? • Adapting P-Tuning to Solve Non-English Downstream Tasks • NVIDIA AI Platform Delivers Big Gains for Large Language Models • The King’s Swedish: AI Rewrites the Book in Scandinavia • eBook Asset • No Hang Ups With Hangul: KT Trains Smart Speakers, Customer Call Centers With NVIDIA AI GTC • How to Build Generative AI for Enterprise Use-cases • Leveraging Large Language Models for Generating Content • Power Of Large Language Models: The Current State and Future Potential • Generative AI Demystified • Efficient At-Scale Training and Deployment of Large Language Models – GTC Session • Hyperparameter Tool GTC Session

20231102 エヌビディアの生成 AI 向けソリューションNemo Frameworkのご紹介

20231102 エヌビディアの生成 AI 向けソリューションNemo Frameworkのご紹介

Murakami Mana

More Decks by Murakami Mana

Other Decks in Technology

Featured

Transcript

AI Nemo Framework Mana Murakami, Senior Solution Architect, NVIDIA |

Agenda • AI • NVIDIA Nemo Framework •

• AI

AI Wikipedia NLLB-200 CODEX MegaMolBART e-Diffi GPT-3 How has NVIDIA

Transformer LLM 0 2000 4000 6000 8000 10000 12000 14000

Pre-Training • Fine-Tuning ( SFT や RLHF ) • •

Pre-Training • • Fine-Tuning ( SFT や RLHF ) •

NVIDIA Nemo

カスタムLLM開発の為のNVIDIA NeMo P-tuning, SFT, Adapters, RLHF, AliBi AI NeMo Framework

NeMo Framework AI Training Inference ✓ 3D Parallelism : Data,

Nemo Toolkit NeMo Framework AI Nemo Training コンテナ Nemo Inference

Pre-Training • Fine-Tuning ( SFT や RLHF ) • •

Tensor & Pipeline Parallelism Sequence Parallelism Selective Activation Recomputation AI

3D (NVIDIA Megatron) Efficient Large-Scale Language Model Training on GPU

3D (NVIDIA Megatron) Efficient Large-Scale Language Model Training on GPU

FSDP (Fully Sharded Data Parallel), ZeRO-3 PyTorch DeepSpeed FSDP Data

3D Parallelism vs. FSDP • FSDP • • • FSDP

3D Parallelism vs. FSDP • FSDP • • • FSDP

Nemo Frameworkの学習性能 (300億トークンの学習) Time to train 300B tokens in DAYS

Pre-Training • Fine-Tuning ( SFT や RLHF ) • •

生成AI向け学習済モデルの公開 NVIDIA Nemo Framework • Hugging Face NGC • from

Community NVIDIA Nemo Framework • Llama2 HF StarCoder #!/bin/sh git-lfs

Nemo のモデルカスタマイズツール群 PROMPT ENGINEERING PROMPT LEARNING PARAMETER EFFICIENT FINE-TUNING

Nemo Framework • INSTRUCTION TUNING • [Playbook] NeMo Framework Supervised

Pre-Training • Fine-Tuning ( SFT や RLHF ) • •

TensorRT-LLM: LLM https://github.com/NVIDIA/TensorRT-LLM TensorRT-LLM LLM NVIDIA GPU TensorRT-LLM # define

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. TensorRT-LLM TensorRT FasterTransformer • TensorRT-LLM

TensorRT-LLM • • Custom MHAs Inflight-batching paged attention quantized KV

その他の機能 Nemo Framework • • • [Documentation] NeMo Data Curator

まとめ Nemo Framework • Pre-Training Inference LLM • NVIDIA Megatron

Language Models Text-to-Image Models Image-to-Image Models GPT T5, mT5, T5-MoE

Appendix. • NVIDIA Generative AI Solutions • NVIDIA NeMo Framework