Upgrade to Pro — share decks privately, control downloads, hide ads and more …

INTERFACE by apidays 2023 - Open Source ML, Oma...

INTERFACE by apidays 2023 - Open Source ML, Omar Sanseviero, Hugging Face

INTERFACE by apidays 2023
APIs for a “Smart” economy. Embedding AI to deliver Smart APIs and turn into an exponential organization
June 28 & 29, 2023

Open Source ML - from pretrained models to production
Omar Sanseviero, Machine Engineering Lead, Hugging Face

------

Check out our conferences at https://www.apidays.global/

Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8

Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io

Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/

apidays

July 11, 2023
Tweet

More Decks by apidays

Other Decks in Programming

Transcript

  1. Open Source ML - from pretrained models to production Run

    State of the Art Open Source LLMs in Production
  2. The Hugging Face Hub Models Spaces Access over 200k models

    shared by the community. Build ML Apps and Demos to showcase how models work. Datasets Share, access and collaborate on over 45k datasets.
  3. The Hugging Face Hub Models Spaces Access over 200k models

    shared by the community Build ML Apps and Demos to showcase how models work. Datasets Share, access and collaborate on over 45k datasets. 99k-> 200k 19k->60k 16k->45k
  4. The Model Hub • Models across modalities (Computer Vision, NLP,

    Audio, multimodal, RL, tabular) • Multiple libraries (PyTorch, Keras, fastai, SpaCy, NeMo, PaddlePaddle, Stanza, timm) • 180+ supported languages • Model cards for documentation ◦ Metrics reporting ◦ CO2 emissions ◦ TensorBoard hosting ◦ Interactive widgets
  5. StarCoder LLaMA Falcon Recent popular models • Code generation •

    15.5B parameters • OpenRAIL License • 80+ languages • 1 trillion tokens • Large ecosystem • 7B to 65B parameters • Non-commercial • 1-1.4 trillion tokens • Best OS model • 7B to 40B parameters • Apache 2.0 • Multilingual • 1 trillion tokens
  6. Challenges Evaluation Existing benchmarks don’t fully capture real world use

    cases (e.g. multi-turn). Customizability Users want models tuned to their own data or use cases while preserving privacy. Model size LLMs require lots of memory, might not fit into a single machine, require complex parallelism and communication. Optimization Due to model size, latency and throughput are often impacted leading to require optimized models.
  7. Some things you can do Load in 4-bit or 8-bit

    mode (bitsandbytes, accelerate) Loading Distribute among GPUs (accelerate) Multi-GPU Use tools optimized for LLMs (text-generation-inference) Inference Libraries Set device_map="auto" or even ooad layers to CPU (slow) Falcon 40B with 45GB (8-bit) or 27GB (4-bit) of RAM Used by HF in production!
  8. Text-generation-inference (TGI) Tensor Parallelism Token Streaming Metrics and monitoring TGI

    supports most popular LLMs, such as StarCoder and SantaCoder Falcon LLaMA, Galactica and OPT GPT-NeoX Quantization Optimizations Security
  9. Training Fine-tuning PEFT • $$$ • Lots and lots of

    data • Lots of expertise • $$ • Much less data and compute • $ • Even less compute Recent popular models overview (Parameter Eicient Fine-Tuning) You can fine-tune Whisper or Falcon-7b in free Collab
  10. Example: Whisper • 1% of trainable params, 5x more batch

    size • Fine-tune a 1.6B parameter model with less than 8GB GPU VRAM • The resulting checkpoints were less than 1% the size of the original model Full-Tuning Results in OOM LoRA
  11. QLoRA 4-bit Quantization 4-bit quantized pretrained LM RLHF Base model

    with multiple adapters Efficient Fine-tune 65B parameter model on a single 48GB GPU
  12. Why demos? • Easily present to a wide audience •

    Increase reproducibility of research • Diverse users can identify and debug failure points
  13. CREDITS: This presentation template was created by Slidesgo, and includes

    icons by Flaticon, infographics & images by Freepik and illustrations by Storyset Thanks! [email protected] Omar Sanseviero @osanseviero CREDITS: This presentation template was created by Slidesgo, and includes icons by Flaticon, infographics & images by Freepik and illustrations by Storyset and Chunte Lee