Generative AI Frontline in 2023/2024

Generative AI Frontline in 2023/24 Seoul AI Hub CES 2024
Jeongkyu Shin Lablup Inc. 2024. 1. 12

• A category of deep learning models Segmentation, categorization, etc.
Deep learning models that produce specific outputs • Examples The ability to create content Pictures, text, sounds, and more Generate results or intermediate queries based on the user s input or interaction. Generative AI generative fountain with steam punk style, starlight from the deep sky, breeze on the water surface. 2

• Generative AI in the spotlight Large language models Image
generation models Multimodal models • All different models? Although it looks like a separate model, They share an essence. Generative AI generative fountain with steam punk style, starlight from the deep sky, breeze on the water surface. 3

• Evolution Non linear processes Explosive exponential growth at some
point • 2018 Rapid evolution since the Transformer architecture • 2020 Discovering the idiosyncrasies of Large language models • 2022 Launching a service to democratize Large language models ChatGPT... Everybody knows. Explosive evolution of language models 1 https://arxiv.org/pdf/2304.13712.pdf 5

• Foundation Model Large AI models trained on massive unlabeled
data in a self supervised manner Perform large scale pre training on a wide range of data Ready to use after fine tuning or in context learning for a variety of missions • Why big models? Using a sledgehammer to crack a nut? All you need is a nutcracker, but all the missions require the size of a sledgehammer. Task nderstanding Context Based on Logical Structure The process must be fully human interactive These two things are huge problems • Problem Training a foundation model requires lot of resources Pre Training Model / Foundation Model 6

• Service model base foundation model fine tuning Training a
model from scratch is too costly • Fine tuning The base model specialized in language processing lacks purposefulness Models trained on the structure of the language Fine tune for specialized knowledge and answer sets Putting code in the middle to reference external search engines and databases for actual data, etc. • Example: Pathways Google Pathways: Underlying model structure PaLM 2: Pathways Structure based Language Model Med PaLM 2: A medical knowledge specific fine tuned model Sec PaLM 2: A security specific fine tuned model Minerva: A fine tuned model that specializes in math. Fine tuning 7

• Proprietary Foundation models A handful of large corporations exclusively
develop pre trained language models These models are operated on vast cloud resources Handle a variety of complex tasks • Example of ChatGPT No matter how I calculate, I can t get the cost right Cost of ChatGPT 3.5 as calculated in Feb. 2023: 29 per user? Can we do it cheaper with economies of scale? Outlook through early 2023 8

• Changes in proprietary based model business Performance: Is that
better than ChatGPT? Delayed publication timing further delayed in relation to cost Stay ahead of the competition: GPT 4 is now available as the default model for paid users August 7, 2023 Cost: Too expensive Delayed commercialization Finding potential: everyone started to think this could work really well. Resetting nuanced relationships between stakeholders, etc. • Is the underlying model also open source? Various open source based models existed Were unable to compete on size and performance. Spring 2023 Startups: Let s show we can do it too! Governments: if we rely on one company for this technology, there will be lock in. After March 2023 9

• Public release / support of a national base language
models Government: We will empower you! Abu Dhabi Falcon, Jun. 2023 , United Kingdom ExaScale Supercomputer, July 2023 , Japan SB Institution, August 7, 2023 , EU Spain, Mistral on MareNostum 5, December 2023 India Bahshini with Corover.ai, December 12, 2023 • Open source based models Enterprise: Meta Llama2, Cerebras GPT, StableLM, Mosaic MPT, Mistral, etc. Communities: EleutherAI Pythia, Polyglot, BLOOM, GPT J, RedPajama, OpenHathi, and more National support: Falcon • Foundation models / checkpoints for everyone Open Korean model does not exist yet... Open Source Large language models 10

• Cloud based Gemini December 2023 1 Google s Next
Generation Language Model / Available via Bard Develop in three sizes • Nano, Pro, Ultra • Coming soon to Android mobile PaLM 2 May 2023 2 Gecko, Otter, Bison, Unicorn Application specific development • Med PaLM, Sec PaLM • Duet AI integration Specialized development for Korean and Japanese ! Claude v2 July 2023 Anthropic s improved language model Long input token length: 100,000 tokens... If this is long • If the post described earlier is very long, and the • Become a very memorable language model Battleground: Pre trained language models • Open models Falcon LLM Jun. 2023 A Large language model funded by Abu Dhabi Huge, unconstrained language models Falcon 180B: The largest open source language model • Compare: GPT 3.5: 175B Llama 2 July 2023 Llama Improvement Model for Meta Allow virtually unlimited commercial use • virtually, but not unlimited Mistral October 2023 Most versatile under 10B in size • Can be used as a calibration model for other models Application cases that take advantage of the small size • Mixtral 8x7B: Mixture of Experts MoE at a practically applicable size 1 https://deepmind.google/technologies/gemini/ 2 https://blog.google/technology/ai/google palm 2 ai large language model/ 11

• GPU: Mythical Product Tesla orders 10,000 A100s H2 2022
, later reveals goal of autonomous data center powered by 100,000 GPUs Microsoft / OpenAI orders 10,000 H100s Jan. 2023 X formally known as Twitter orders 10,000 H100s Apr. 2023 Google builds A3 supercomputer with 26,000 A100/H100s May 2023 Bytedance orders 100,000 A800/H800 units Jun. 2023 / 1B value Alibaba s H800 order of tens of thousands of units Jun. 2023 / 4B in value After orders from ByteDance and Alibaba U.S. begins export restrictions on H800 GPUs to the People s Republic of China Jun. 2023 I thought it wouldn t affect the amount already ordered, but... • Give us a GPU too That price cannot pick up. Sorry. Battleground: GPU Hardware Market / Situation https://www.hpcwire.com/2023/02/20/google and microsoft set up ai hardware battle with next generation search/ https://cloud.google.com/blog/products/compute/introducing a3 supercomputers with nvidia h100 gpus?hl en https://www.cnbc.com/2023/07/28/microsoft annual report highlights importance of gpus.html https://www.aJun.ws.com/view/20230727113146316 12

• National rivalry Treat GPUs as a strategic resource After
Huawei s announcement of plans for a Saudi cloud region • Start of U.S. Export Restrictions on GPUs to Saudi Arabia Aug. 31, 2023 Start of U.S. Export Restrictions on GPUs to China Oct. 17, 2023 • A100, A800, H100, H800, L40, L40S, RTX 4090 • Export restrictions on nearly all GPUs that can be utilized for AI from high to mid spec. NVIDIA s Response • 4090D announced as a China only GPU Dec. 14, 2023 • RTX 5880 announced as cut chip of RTX6000 Jan. 9, 2024 Volume Offensive Microsoft s H100 granularity pre orders 200,000 units?, full year 2023 Extrajudicial assistance Israeli government subsidizes Intel s semiconductor plant 3.2B, 4.1B, Dec. 26, 2023 • Responses to demand outstripping supply NVIDIA: Announces support for inferencing on desktop level GPUs Windows Oct. 17, 2023 Battleground: GPU Hardware Market / Situation https://www.tomshardware.com/news/us bans sales of nvidias h100 a100 gpus to middle east https://www.cnbc.com/2023/10/17/us bans export of more ai chips including nvidia h800 to china.html https://blogs.nvidia.com/blog/2023/10/17/tensorrt llm windows stable diffusion rtx/ https://videocardz.com/newz/nvidia geforce rtx 4090d reportedly has no oc support and lower tdp of 425w 13

• Competition for resources for very large matrix computations on
a national scale United States Sent first batch of Xeon Max to Argonne. Purchased initial quantities of Cerebras C2 and Groq Summer 2022 United Kingdom ExaScale project May 2023 ; decision to go with NVIDIA Nov. 2023 EU MareNostum 5 launched Dec. 2023 , ranked #8 in the Top500 Japan SB Institution launched Aug. 2023 : A language model by the Japanese, for the Japanese, and made in Japan Dedicated Fugaku for language model / Preparing Tsubame 4.0 Apr. 2024 China The world is wiping out used GPUs to avoid mass export restrictions Nov. 2023 Companies are refactoring gaming GPUs for AI Battleground: GPU Hyperscaler Market / Situation https://www.cnbc.com/2023/07/07/why japan is lagging behind in generative ai and creation of llms.html https://www.softbank.jp/en/corp/news/press/sbkk/2023/20230804 02/ 14

• GPU vendor competition NVIDIA H100 2022 , H200 2024
B100 2024 , X100 2025 AMD MI250X 2023 , MI300A 2024 MI300X 2023 : 192GB HBM3 Intel Gaudi 2 2023 , Gaudi 3 2024 GPU Max 2023 • Hybrid APUs NVIDIA GH200 2024 , GX200 2025 AMD MI300A 2024 Battleground: GPU Hardware Market / Situation https://www.reuters.com/technology/microsoft developing its own ai chip information 2023 04 18/ https://www.hpcwire.com/2021/06/22/ahead of dojo tesla reveals its massive precursor supercomputer/ 15

• Approach from cloud and AI companies Amazon Inferentia2 2022
Configuring a NeuronCore v1 based chiplet Microsoft Azure Maia 100, Cobalt 100 2024 Athena Project unveils, announces 2024 target release Nov. 2023 Meta MTIA gen2 Initial model release in 2021, second generation overview in May 2023 Tesla Dojo 2023 First Tape Out in Jun. 2023 Toroidal architecture Similar to Google TPU Battleground: GPU Hardware Market / Situation https://www.reuters.com/technology/microsoft developing its own ai chip information 2023 04 18/ https://www.hpcwire.com/2021/06/22/ahead of dojo tesla reveals its massive precursor supercomputer/ 16

• AI accelerators for training TPU Google v5p, v5e Gaudi2,
Gaudi 3, GPU Max Intel IPU/BOW GraphCore Cerebras WSE Cerebras SN30/40L SambaNova GroqChip / GroqCard Groq Loihi2/ Nahuku/ Kapoho Point Intel 2022 • AI Accelerators / NPUs for Inference TPU v5e Google BrainWave, Maia, Cobalt Microsoft Alveo AMD, Xilinx / FPGA • Developed in Korea Sapeon x220 2020 , x330 2023 FuriosaAI Warboy 2021 , Renegade 2024 Rebellions ATOM 2022 Battleground: AI Workload Acceleration / NPUs 17

• AI accelerators for training High difficulty of general OPS
implementation Isn t the transformer the only thing that matters anyway? But transformers require a lot of OPS... Take a detour with an incremental approach to your support model Let s make training chips using the methodology of making inference chips! The determination that major AI models have become popularized • Transformer everywhere • AI inference accelerator Low power, low latency, low heat PCI E, USB C, and GPIO interfaces More FPGA based IP companies are emerging Bittware, etc. RAM: Depends on workload, GDDR6 / HBM2/3 Based on FP16 / BF16 / FP8 / INT8 Battleground: AI Workload Acceleration / NPUs 18

• Accurate Quantized Training AQT Pareto Optimal Quantized ResNet Is
Mostly 4 bit 1 If the transformer architecture performs poorly on INT8, Why not just use INT8 for everything from training to serving? Solution to accuracy drop Fix by increasing training steps Software: AQT Google, 2023 2 JAX based implementations Hardware: TPU v5e / v5p Google, 2023 Battleground: AI Workload Acceleration / NPUs 1 arxiv:2105.03536, 2021 2 https://github.com/google/aqt 19

• Training Solutions Megatron Deepspeed Microsoft, Dec. 2022 Integrate NVIDIA
Megatron and Microsoft DeepSpeed to optimize training of massive deep learning models ZenDNN AMD, Sep. 2023 AMD s response to NVIDIA cuDNN Full ML stack support with ROCm OpenXLA Google, Jun. 2023 Automatic hardware optimization / support for high level fusion and GSPMD sharding • Serving solutions vLLM Jun. 2023 Elasticity to coincide with open source / Llama support implementing the PagedAttention algorithm for memory savings ROCm support begins Dec. 2023 TensorRT LLM NVIDIA, Oct. 2023 An optimized implementation of TensorRT focused on fast inference of large language models. Automatic quantization in combination with Triton Inference Server integrated with INT4, INT8 weight and FP16 activation Provide a simple and fast inference interface Battleground: The open software marketplace https://github.com/microsoft/Megatron DeepSpeed https://arxiv.org/abs/2105.04663 https://github.com/vllm project/vllm https://github.com/NVIDIA/TensorRT LLM 20

• Mobile gatekeepers Google Android Gemini Number of parameters in
the smallest Nano1: 1.8B a little too big for mobile... Language models in Apple iOS Siri: The World s Most Powerful AI Brand Only When is left Ferret public Oct. 2023 • Vision fine tuning model based on Vicuna • Meta Persona bots in Meta Planned for Sep. 2023, but postponed or cancelled? Llama2 available as a service on Microsoft Azure, Google Cloud and Alibaba The AI Market: A Game of Guessing • Open Source LLMs Mosaic MPT and Falcon Offers foundation model in your hands Llama 2 Puts you on the same playing field as cloud providers Training and inferencing a variety of multilingual models Fine tuning process gets cheaper Lablup automates the fine tuning and service in enterprise organization • Korean homegrown models LG, Naver, KT, Kakao, etc. Developing foundation models between 11B and 250B It s hard to tell because they haven t published the open models yet. Next time... Alternative example: SOLAR Upstage, 2023 Dividing / merging layers of the Llama2 7B Model Structure to increase size and combine Mistral https://arxiv.org/abs/2310.07704v1 21

• Case study: India • Krutrim Krutrim Si Designs, Dec.
2023 Dec. 15, 2023 First LLM to cover most languages in the Indian subcontinent Multimodal model / Train with 2T tokens for Pro version • OpenHathi Sarvam AI Import Llama2 7B structures and train them in Hindi AI Market: Race Follow up / India Case • BraratGPT Corover.ai, Dec. 2023 Support for 12 national languages Open sourcing state led development: Bhashini National Language Translation Mission under MeiTY • Project Vaani Vaani and Google, 2023 A project to make India s digitalization more inclusive Aims to collect and open source speech data in various regional languages 773 locations / Indian languages https://www.business standard.com/companies/news/ola s bhavish unveils krutrim the multi lingual ai for 1 4 bn indians 123121500874 1.html https://economictimes.indiatimes.com/tech/technology/corover ai officially launches bharatgpt in partnership with google cloud/articleshow/105912061.cms https://vaani.iisc.ac.in 22

• Bias Microsoft Tay 2021 and Google LaMDA 2022 Racism
in Amazon Rekognition 2023 • Safety Jailbreak almost every language model in existence July 27 You can break through the guide wall and ask anything • Fairness Racial Bias in Amazon s Interview AI 2020 Google s Genesis news writing AI test July 19, 2023 Risk factors https://arxiv.org/abs/2307.15043 23

• Rise of multimodal models Where AI and IT is
headed As the technology matures Point users in the direction of paying Mature technologies Vision, Image GenAI, LLM Advances in the LLM Vision Multimodal Area Derive reasoning based BI Co pilot: Reduce the difficulty of using expert systems Examples Microsoft Office Copilot / Google Duet AI Unity AI Muse Sentis , Unreal AI Unreal Engine Midjourney v6, SDXL Turbo Dec. 2023 2024: Predictions 1 • AI generated by AI Similar to the knowledge distillation model J. Hinton 2015, Google 2017 Step 1: Train your AI on the data it generates ShareGPT Jun. 2023 , etc. Various cases have already been reported ByteDance s OpenAI account blocked Dec. 16, 2023 Step 2: Optimize and lighten your AI powered models AI auto build pipeline AiZip, Dec. 2023 • Automate AI design: Design AI structures and apply AI across MLOps AutoML / MLops on Vertex AI Google, 2023 • Automating AI development with Duet AI • Fueled by Gemini https://www.theverge.com/2023/12/15/24003542/openai suspends bytedances account after it used gpt to train its own ai model 24

• On device AI, advanced Models that go beyond IoT
level on device AI Fueled by open AI models Desktop AI Desktop/PC: Have enough computational resources to run a good sized model Attempts to run AI features on the desktop • CPU: Adds machine learning specific instruction sets • VNNI instruction in AVX 2022 • Apple M1/M2/M3 2020 • Desktop GPU/NPU: Desktop compatible hardware dedicated to machine learning computations. • Intel Core Ultra / Meteor lake Dec. 2023 • AMD Ryzen AI Jan. 2024 • Apple M series: CPU/GPU with unified memory architecture on desktop • Intel, AMD, NVIDIA: Datacenter level APU. E.g. Xeon max, AMD MI series and NVIDIA GH200 Smartphone AI Performance limitations prevent advanced features Tensor G3 on Pixel 8 Google, 2023 • Gain performance with workload delegation to the cloud 2024: Predictions 2 25

• View Provide an unbiased model Advancing cross domain applications
Increasing need for AI application guidelines Responsible AI Outlook and regulatory expectations • Moving guidelines Frontier Model Forum: Creating a forum for self regulation Google, Microsoft, OpenAI, and Antrophic, among others. Drive self regulation around copyright, deepfakes, and fraud AI legislation in the EU You can t leave it to chance Big tech and open source camps argue for regulatory separation Jul. 26, 2023 Responsible Scaling Policy Antrophic, Sep. 21, 2023 Propose four levels of safety ASL , taking concepts from BSL Prevent the delivery of dangerous data and concepts to users Introduce autonomy controls and safety checks https://venturebeat.com/ai/hugging face github and more unite to defend open source in eu ai legislation/ https://www.theverge.com/2023/7/26/23807218/github ai open source creative commons hugging face eu regulations https://venturebeat.com/ai/anthropics new policy takes aim at catastrophic ai risks/ 26

Thank you! contact lablup.com https://www.facebook.com/lablupInc Lablup Inc. https://www.lablup.com Backend.AI https://www.backend.ai
Backend.AI GitHub https://github.com/lablup/backend.ai Backend.AI Cloud https://cloud.backend.ai 27

Generative AI Frontline in 2023/2024

Generative AI Frontline in 2023/2024

Jeongkyu Shin PRO

More Decks by Jeongkyu Shin

Other Decks in Technology

Featured

Transcript

Generative AI Frontline in 2023/24 Seoul AI Hub CES 2024

• A category of deep learning models Segmentation, categorization, etc.

• Generative AI in the spotlight Large language models Image

4

• Evolution Non linear processes Explosive exponential growth at some

• Foundation Model Large AI models trained on massive unlabeled

• Service model base foundation model fine tuning Training a

• Proprietary Foundation models A handful of large corporations exclusively

• Changes in proprietary based model business Performance: Is that

• Public release / support of a national base language

• Cloud based Gemini December 2023 1 Google s Next

• GPU: Mythical Product Tesla orders 10,000 A100s H2 2022

• National rivalry Treat GPUs as a strategic resource After

• Competition for resources for very large matrix computations on

• GPU vendor competition NVIDIA H100 2022 , H200 2024

• Approach from cloud and AI companies Amazon Inferentia2 2022

• AI accelerators for training TPU Google v5p, v5e Gaudi2,

• AI accelerators for training High difficulty of general OPS

• Accurate Quantized Training AQT Pareto Optimal Quantized ResNet Is

• Training Solutions Megatron Deepspeed Microsoft, Dec. 2022 Integrate NVIDIA

• Mobile gatekeepers Google Android Gemini Number of parameters in

• Case study: India • Krutrim Krutrim Si Designs, Dec.

• Bias Microsoft Tay 2021 and Google LaMDA 2022 Racism

• Rise of multimodal models Where AI and IT is

• On device AI, advanced Models that go beyond IoT

• View Provide an unbiased model Advancing cross domain applications

Thank you! contact lablup.com https://www.facebook.com/lablupInc Lablup Inc. https://www.lablup.com Backend.AI https://www.backend.ai