Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling and Optimizing the Performance of Deep ...

Scaling and Optimizing the Performance of Deep Learning Models: Strategies for Maximizing Efficiency - DevSummit 2024

Serdar Biçer

October 14, 2024
Tweet

Other Decks in Research

Transcript

  1. / Key challenges in scaling deep learning models / Strategies

    to optimize performance and efficiency / Real-world examples and techniques to overcome these challenges AGENDA www.valven.com \ 2
  2. DEEP LEARNING www.valven.com \ 3 STRATEJİ TASARIM GELİŞTİRME Deep learning

    DL) is a subset of machine learning ML Powered by artificial neural networks Mimics the human brain with neuron-like units Tackles complex datasets and tasks Deep Learning Machine Learning Artificial Intelligence
  3. DEEP LEARNING www.valven.com \ 4 www.valven.com \ 4 Machine Learning

    INPUT Deep Learning FEATURE EXTRACTION CLASSIFICATION OUTPUT INPUT FEATURE EXTRACTION  CLASSIFICATION OUTPUT
  4. PROBLEMS WITH DEEP LEARNING www.valven.com \ 6 Requires large datasets,

    struggles with small/simple data 1 > Requires high-end GPUs, higher computational cost 2 > Longer training (hours to weeks) 3 >
  5. CHALLENGES IN DEEP LEARNING NETWORKS www.valven.com \ 7 3 >

    Hyperparameter Tuning Energy Consumption & Cost 4 > 1 > Hardware Limitations Training Instability 2 >
  6. STRATEGIES FOR SCALING DEEP LEARNING MODELS www.valven.com \ 8 /

    Distributed Training / Mixed Precision Training / Gradient Accumulation / Quantization / Distillation STRATEGIES
  7. MIXED PRECISION TRAINING www.valven.com \ 11 Machine Learning FB16 Optimizer

    FB32 Gradients FB16 Weights FB16 Gradients FB32 Weights FB32 Back-Propagation Forward Pass
  8. www.valven.com \ 12 Peak Memory in GB (lower is better)

    Accuracy in % (higher is better) MIXED PRECISION TRAINING
  9. GRADIENT ACCUMULATION www.valven.com \ 14 Peak Memory in GB (lower

    is better) Accuracy in % (higher is better)
  10. EXAMPLE WITH HOROVOD & PYTORCH www.valven.com \ 19 mpirun -np

    4 -H node1:2,node2:2 \ python train.py horovodrun -np 4 -H node1:2,node2:2 \ python train.py import torch import horovod.torch as hvd … hvd.init() def train(epochs=5 … sampler = torch.utils.data.distributed.DistributedSampler(dataset, num_replicas=hvd.size(), rank=hvd.rank()) hvd.broadcast_parameters(model.state_dict(), root_rank=0 optimizer = optim.SGD(model.parameters(), lr=0.01 * hvd.size()) optimizer = hvd.DistributedOptimizer(optimizer, named_parameters=model.named_parameters()) … for epoch in range(epochs): …
  11. > Unclear delivery cycle > Intuitive decisions without data >

    Unpredictable planning > Challenges on team visibility DIFFICULTIES > Remote working habits > Work-Life balance > Rising number in software needs > Trend to catch the competition CIRCUMSTANCES Software Development Challenges SDLC www.valven.com \ 20
  12. VALVEN ATLAS / Architecture Track Development Workflow Automation DORA Metrics

    Actionable Insights Business Alignment Resource Management www.valven.com \ 21