DeepSeek on AWS

© 2025, Amazon Web Services, Inc. or its affiliates. ©
2025, Amazon Web Services, Inc. or its affiliates. Amazon Bedrock과 SageMaker AI를 활용한 DeepSeek R1 모델 배포 및 운영 방법 김성민 Sr. AI/ML Specialist Solutions Architect AWS

© 2025, Amazon Web Services, Inc. or its affiliates. Agenda
• DeepSeek Models on AWS: Hosting, Fine-tuning, and Training § Amazon Bedrock § Amazon SageMaker JumpStart § Amazon SageMaker Endpoint § Amazon SageMaker HyperPod • Key Benefits of Leveraging DeepSeek Models on AWS § Security § Operational Excellence § Cost

© 2025, Amazon Web Services, Inc. or its affiliates. DeepSeek-R1
Performance & Evaluation (source: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf)

© 2025, Amazon Web Services, Inc. or its affiliates. DeepSeek-R1
models DeepSeek-R1-Distill models Model #Total Params #Activated Params Context Length DeepSeek-R1-Zero 671B 37B 128K DeepSeek-R1 671B 37B 128K Model Base Model DeepSeek-R1-Distill-Qwen-1.5B Qwen2.5-Math-1.5B DeepSeek-R1-Distill-Qwen-7B Qwen2.5-Math-7B DeepSeek-R1-Distill-Llama-8B Llama-3.1-8B DeepSeek-R1-Distill-Qwen-14B Qwen2.5-14B DeepSeek-R1-Distill-Qwen-32B Qwen2.5-32B DeepSeek-R1-Distill-Llama-70B Llama-3.3-70B-Instruct https://huggingface.co/deepseek-ai/DeepSeek-R1

© 2025, Amazon Web Services, Inc. or its affiliates. Prompts
Responses Distilled model Advanced model (teacher) Fine-tuned cost-efficient model (student) Match the performance of advanced models with cost- efficient models for your use case with Model Distillation

© 2025, Amazon Web Services, Inc. or its affiliates. Challenges
Security Cost Operational Excellence ML App interface

© 2025, Amazon Web Services, Inc. or its affiliates. How
to Securely Use DeepSeek Models on AWS Amazon Bedrock Amazon SageMaker AI API Layer Amazon Bedrock Foundation Models Prompt / text embeddings Fine-tune SageMaker Training and Inference Prompt / text embeddings API Layer SageMaker Endpoint Foundation Models SageMaker Jumpstart Model hub, deploy, fine-tune Accelerated Computing Trn1(n), Inf2, P4d, P5 Fine-tune

2025, Amazon Web Services, Inc. or its affiliates. Amazon Bedrock The easiest way to build and scale generative AI applications with powerful tools and foundation models

© 2025, Amazon Web Services, Inc. or its affiliates. AMAZON
NOVA JAMBA CLAUDE COMMAND EMBED RERANK LLAMA LUMA RAY 2 Effective reasoning & rapid analysis for long context windows High-quality AI image generation, easily deployable at scale Advanced image & language reasoning Knowledge summarization, expert agents, & code completion High-quality video generation from text & images Software engineering AI for large enterprises STABLE DIFFUSION STABLE IMAGE MISTRAL MIXTRAL MALIBU POINT Frontier multimodal intelligence at low- latency, Agent & RAG Applications, high-quality image & video generation Advanced reasoning & coding capabilities, including computer use skills Multimodal search & advanced retrieval powering multilingual knowledge agents Amazon Bedrock B R O A D C H O I C E O F M O D E L S Coming soon Amazon Bedrock Marketplace enables developers to discover, test, and use over 100 popular, emerging, and specialized foundation models (FMs) alongside the current selection of industry-leading models in Amazon Bedrock. DeepSeek-R1 model is now available in Amazon Bedrock Marketplace. As of 10 March 2025, the fully managed DeepSeek-R1 model is now generally available in Amazon Bedrock.

© 2025, Amazon Web Services, Inc. or its affiliates. converse()
– DeepSeek-R1

© 2025, Amazon Web Services, Inc. or its affiliates. Amazon
Bedrock APIs for Model Invocation API 명칭 주요 기능 스트리밍 지원 InvokeModel 단일 프롬프트 기반의 응답 생성 X Converse 대화 기반의 응답 생성 X InvokeModelWithResponseStream 스트리밍 방식의 단일 프롬프트 응답 생성 O ConverseStream 스트리밍 방식의 대화 기반 응답 생성 O

© 2025, Amazon Web Services, Inc. or its affiliates. Step
1: Request access for DeepSeek-R1 model

2: Select model

3: Playground or Converse API

to Use DeepSeek Models on Amazon Bedrock Amazon Bedrock Marketplace Custom Model Import SageMaker JumpStart 2 3 4 Fully Managed Serverless Model 1

© 2025, Amazon Web Services, Inc. or its affiliates. Bedrock
Marketplace implementation • Bedrock Marketplace enables core DeepSeek-R1 deployment in managed endpoints • Complete code samples and step-by-step deployment guides provided for quick implementation • Standard Bedrock security and monitoring features

© 2025, Amazon Web Services, Inc. or its affiliates. Bedrock
Marketplace delivers 100+ models from 30+ providers EVOLUTIONARY SCALE WIDN CAMB.AI GRETEL ARCEE AI PREFERRED NETWORKS WRITER UPSTAGE NCSOFT STOCKMARK KARAKURI JOHN SNOW LABS LIQUID DATABRICKS CYBERAGENT HUGGING FACE STABILITY AI LG AI RESEARCH M I S T R A L AI SNOWFLAKE N V I D I A DEEPSEEK

© 2025, Amazon Web Services, Inc. or its affiliates. Prerequisite:
Increase your ml.p5e.48xlarge limits before deployment

1: Find the DeepSeek-R1 model on the catalog

1: Find the DeepSeek-R1 model on the catalog (cont’d)

2: Set options (ml.p5e.48xl by default) and deploy

3: Playground or InvokeModel API

© 2025, Amazon Web Services, Inc. or its affiliates. Custom
Model Import implementation • Bedrock Custom Model Import enables DeepSeek deployment • Support for Llama 8B and 70B distilled DeepSeek R1 variants • Complete code samples and step-by-step deployment guides provided for quick implementation • Standard Bedrock security and monitoring features • Pricing is on-demand in 5-minute window from first successful invocation • There is a cold-start and scaling up/down time

1: Create Custom Model Import Job

2: Import DeepSeek-R1-Distill model

2: Import DeepSeek-R1-Distill model (cont’d)

3: Playground or Converse API

© 2025, Amazon Web Services, Inc. or its affiliates. You
are always in control of your data None of the customer’s data is used to train the underlying model Data remains in the Region where the API is processed Support for GDPR, SOC, ISO, CSA compliance, and HIPAA eligibility

© 2025, Amazon Web Services, Inc. or its affiliates. Critical
Concerns • Models hosted by AWS without any communication with DeepSeek servers or APIs • No customer data used to improve base models • Enterprise-grade data protection capabilities • Privacy control through AWS services

© 2025, Amazon Web Services, Inc. or its affiliates. AWS
Region network Client account Customer VPC Corporate network Client API endpoint Client Amazon Bedrock service account Amazon Bedrock service Amazon Bedrock Client connectivity

Region network Client account Customer VPC Corporate network Internet Client API endpoint Client Amazon Bedrock service account Amazon Bedrock service Internet gateway Amazon Bedrock Client connectivity

Region network Client account Customer VPC AWS PrivateLink aka VPC endpoint Corporate network Client AWS Direct Connect API endpoint Client Amazon Bedrock service account Amazon Bedrock service Amazon Bedrock Client connectivity

Bedrock Integration Choice Customization Security and governance

to Securely Use DeepSeek Models on AWS Amazon Bedrock Amazon SageMaker AI API Layer Amazon Bedrock Foundation Models Prompt / text embeddings Fine-tune SageMaker Training and Inference Prompt / text embeddings API Layer SageMaker Endpoint Foundation Models SageMaker Jumpstart Model hub, deploy, fine-tune Accelerated Computing Trn1(n), Inf2, P4d, P5 Fine-tune

2025, Amazon Web Services, Inc. or its affiliates. Amazon SageMaker AI Build, train, and deploy ML models at scale, including FMs

© 2025, Amazon Web Services, Inc. or its affiliates. Deploy
DeepSeek models with Amazon SageMaker AI SageMaker Endpoint SageMaker JumpStart 1 2

© 2025, Amazon Web Services, Inc. or its affiliates. Model
Deployment on Amazon SageMaker AI Single model deployment Single container Multi-container Invoke Response Inference Pipelines Real-time synchronous response Serverless GPUs CPUs Near real-time asynchronous response Invoke Response Offline batch inference Submit Complete Amazon SageMaker AI Multi-Model deployment Model Container Infrastructure

© 2025, Amazon Web Services, Inc. or its affiliates. A
strong partnership between AWS and Hugging Face Hugging Face is the most popular Open Source company providing state of the art NLP technology Hugging Face SageMaker offers high performance resources to train and use NLP Models AWS https://huggingface.co/ https://aws.amazon.com/sagemaker/

© 2025, Amazon Web Services, Inc. or its affiliates. Large
Model Inference (LMI) container Large ML models with 100 billion + parameters Easily parallelize models across multiple GPUs to fit models into the instance and achieve low latency Deploy models on the most performant and cost- effective GPU-based instances or on AWS Inferentia Leverage 500GB of Amazon EBS volume per endpoint

SageMaker Deployment Hosting Services Inference Image Training Image Training Data Model artifacts Amazon SageMaker Amazon S3 Amazon ECR

SageMaker Deployment Hosting Services Inference Image Training Image Training Data Model artifacts Amazon SageMaker Amazon S3 Amazon ECR Model artifacts

SageMaker Deployment Hosting Services Inference Image Training Image Training Data Model artifacts Amazon SageMaker Amazon S3 Amazon ECR Model artifacts Inference Image

SageMaker Deployment Hosting Services Inference Image Training Image Training Data Model artifacts Endpoint Amazon SageMaker Amazon S3 Amazon ECR Model artifacts Inference Image

model to SageMaker Real-time Endpoint

model to SageMaker Real-time Endpoint model.tar.gz ├ model.py └ serving.properties

SageMaker Deployment SageMaker Endpoints (Private API) Auto Scaling group Availability Zone 1 Availability Zone 2 Availability Zone 3 Elastic Load Balancing Model Endpoint Client Deployment / Hosting Amazon SageMaker ML Compute Instances Input Data (Request) Prediction (Response)

SageMaker Deployment SageMaker Endpoints (Public API) Auto Scaling group Availability Zone 1 Availability Zone 2 Availability Zone 3 Elastic Load Balancing Model Endpoint Amazon API Gateway Client Deployment / Hosting Amazon SageMaker ML Compute Instances Input Data (Request) Prediction (Response)

DeepSeek models with Amazon SageMaker AI SageMaker Endpoint SageMaker JumpStart 1 2

© 2025, Amazon Web Services, Inc. or its affiliates. Machine
learning (ML) hub with foundation models (FMs), built-in algorithms, and prebuilt ML solutions that you can deploy with just a few clicks Amazon SageMaker Jumpstart − Publicly available FMs − Built-in ML algorithms − Customizable solutions − Supports collaboration

from SageMaker JumpStart 59

© 2025, Amazon Web Services, Inc. or its affiliates. 60
Deploy from SageMaker JumpStart (cont’d)

© 2025, Amazon Web Services, Inc. or its affiliates. Use
Bedrock tooling with SageMaker JumpStart Models Client Amazon SageMaker JumpStart Amazon Bedrock access using SageMaker SDK, Boto3 access using Bedrock API

© 2025, Amazon Web Services, Inc. or its affiliates. Benefits
of using Bedrock tooling with SageMaker JumpStart Models Client Amazon SageMaker JumpStart Amazon Bedrock access using SageMaker SDK, Boto3 access using Bedrock API Amazon Bedrock Guardrails Knowledge Bases for Amazon Bedrock Amazon Bedrock Agents …

to Use DeepSeek Models on AWS: A One-Page Guide (2) Custom Model Import 3 (3) Bedrock + SageMaker JumpStart (1) SageMaker JumpStart 1 (2) SageMaker Endpoint 2 Amazon Bedrock Amazon Bedrock Marketplace Amazon S3 Client Amazon S3 Hugging Face Amazon SageMaker Endpoint Client Amazon SageMaker JumpStart Client Amazon SageMaker JumpStart Amazon Bedrock 4 (1) AWS Marketplace 2 (1) Fully Managed Serverless Model 1 User

• AWS protects model tuner/consumer’s data • AWS protects model
provider’s IP • Proprietary model package and endpoint is hosted in SageMaker/Bedrock owned escrow account • Containers have no outbound network access Security

© 2025, Amazon Web Services, Inc. or its affiliates. Save
costs by deploying on Amazon SageMaker Infrastructure cost Operations cost Infrastructure cost Operations cost Security and compliance cost • Compute instances • Storage • Network Operating, managing, and maintaining infrastructure Security and compliance for ML features, encrypt data and models, access policies, track and trace Deploy on SageMaker Self-managed deployment on Amazon EKS or Amazon ECS

AI Chips AWS for generative AI AWS Inferentia AWS Trainium AWS Trainium2 Lowest cost Best price performance Lowest cost to train up to 70b models Highest performance for frontier models AWS Inferentia2

Inferentia2: High performance, less power, lower cost R E A L - T I M E D E P L O Y M E N T B E R T - L A R G E W I T H A W S I N F E R E N T I A 2 50% Fewer instances GPU Instances Inf2.2xl Instances Number of instances 50% Less energy GPU Instances Inf2.2xl Watts Power 65% Lower cost GPU Instances Inf2.2xl USD Inference cost

Trainium2: Highest performance for frontier models L L M T R A I N I N G P E R F O R M A N C E Lower cost-to-train Step Time 12.9 Amazon EC2 P5 Instances F u j i 7 0 B T R A I N I N G Amazon EC2 Trn2 Instances JAX/AXLearn framework, 64 node cluster 8.7

2025, Amazon Web Services, Inc. or its affiliates. Amazon SageMaker HyperPod Scale and accelerate generative AI model development across thousands of AI accelerators

© 2025, Amazon Web Services, Inc. or its affiliates. Model
Builder Model Consumer Model Tuner Use FMs out-of-the-box Finetune FMs for specific domain/workload Build FMs or retrain open source FMs from scratch Low entry cost and complexity, faster TTM Strong control and flexibility Customer Pathways with Foundation Models

© 2025, Amazon Web Services, Inc. or its affiliates. Low
entry cost and complexity, faster TTM Strong control and flexibility Re-train new FM models using DeepSeek R1 Finetune DeepSeek R1 and R1 distilled Deploy DeepSeek R1 and R1 distilled Use FMs out-of-the-box Finetune FMs for specific domain/workload Build FMs or retrain open source FMs from scratch Customer Pathways with SageMaker AI for DeepSeek

© 2025, Amazon Web Services, Inc. or its affiliates. Build
models will require Scale this Single instance to this . . . . . . . . . . . . . .

SageMaker AI for Large Scale FM Training SageMaker HyperPod SageMaker Training Jobs Customer is proficient with SLURM & EKS AND Customer wants to persistent cluster & ability to customize and manage orchestration Customer wants a managed user experience with ephemeral clusters & focus on ML (to accelerate time to market) OR Customer needs access to flexible on- demand GPU cluster v You can fine-tune DS R1 and R1 distilled models using your choice of libraries

© 2025, Amazon Web Services, Inc. or its affiliates. SageMaker
HyperPod example architecture Amazon S3 Head Node VPC Amazon FSx for Lustre Training data Training data Copy Once mount EFA(*) Slurm or EKS(k8s) Compute Nodes * EFA: Elastic Fabric Adapter

© 2025, Amazon Web Services, Inc. or its affiliates. Benefits
of SageMaker HyperPod YEAR 1957 2012 2014 2016 2018 2019 2020 2021 … … … Model size (# of parameters) VGG16 138M YOLO, GNMT 210M BERT-L 340M GPT-2 1.5B GPT-3 175B 2023 Perceptron 1 Alexnet 62M SWITCH-C 1.6T Ease of use & flexibility Resilience Performance • 훈련 시간 단축 및 대규모 분산 훈련 • 클러스터의 컴퓨팅, 메모리 및 네트워크 리소스 활용 최적화 • 자동 클러스터 상태 확인 및 복구 • Checkpoint 저장 및 자동 복구 기능 • 대규모 훈련 클러스터의 분산 훈련 간소화 • Slurm 또는 Amazon EKS를 통한 유연한 워크로드 관리

SageMaker HyperPod Recipes R U N F M P R E - T R A I N I N G A N D F I N E - T U N I N G W I T H A S I N G L E L I N E O F C O D E Open Source implementation Launcher scripts and recipes collection Built on NVIDIA NeMo foundations (launcher, configuration hierarchy) Over 30 recipes to get started SageMaker-optimized models (GPU) Neuron-optimized models (Trainium) Native NeMo models Custom models

© 2025, Amazon Web Services, Inc. or its affiliates. Training
plans Today (3/5) Segment 1 10 instances 7 days Segment 2 10 instances 7 days “Create a training plan with 10 instances of ml.p5.48xlarge for 14 days starting 3/10” 3/10 3/16 3/20 3/26

2025, Amazon Web Services, Inc. or its affiliates. Summary

to Use DeepSeek Models on AWS: A One-Page Guide (2) Custom Model Import 3 (3) Bedrock + SageMaker JumpStart (1) SageMaker JumpStart 1 (2) SageMaker Endpoint 2 Amazon Bedrock Amazon Bedrock Marketplace Amazon S3 Client Amazon S3 Hugging Face Amazon SageMaker Endpoint Client Amazon SageMaker JumpStart Client Amazon SageMaker JumpStart Amazon Bedrock 4 (1) AWS Marketplace 2 (1) Fully Managed Serverless Model 1 User

to Use DeepSeek Models on AWS Model Consumer Model Tuner or Builder Pre-trained Model Custom Model Amazon SageMaker JumpStart Amazon Bedrock Amazon SageMaker Training Job Amazon SageMaker Endpoint Amazon SageMaker HyperPod

© 2025, Amazon Web Services, Inc. or its affiliates. Key
Benefits of Leveraging DeepSeek Models on AWS Security ML App interface Operational Excellence through Separation of Concerns Cost saving opportunity in production

© 2025, Amazon Web Services, Inc. or its affiliates. (AWS
Blog) DeepSeek R1 on AWS Hosting DeepSeek models on SageMaker Call to Action SageMaker HyperPod Workshop

2025, Amazon Web Services, Inc. or its affiliates. 여러분의 소중한 피드백을 기다립니다. 강연 종료 후, 강연 평가에 참여해주세요!

2025, Amazon Web Services, Inc. or its affiliates. 감사합니다

DeepSeek on AWS

DeepSeek on AWS

More Decks by Sungmin Kim

Other Decks in Technology

Featured

Transcript