DeepSeek on AWS

© 2025, Amazon Web Services, Inc. or its affiliates. All
rights reserved. 2 © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Yoshitaka Haribara, Ph.D. T A I A A I # 0 7 - S C A L I N G A I P E R F O R M A N C E Sr. GenAI/Quantum Startup Solutions Architect AWS DeepSeek on AWS

rights reserved. 3 • DeepSeek-R1 and Distilled Models Overview • Accelerators: NVIDIA H200 GPU and AWS Trainium • Deployment options on AWS: Bedrock, SageMaker AI, and EC2 • Best practices Agenda

rights reserved. 4 DeepSeek offers a range of open weights models and efficient distilled variants. Base and R1 Models (671B) • DeepSeek-V3: Base MoE model • DeepSeek-R1-Zero: Pure reinforcement learning • DeepSeek-R1: Cold-start data before RL Distilled Models • DeepSeek-R1-Distill-Qwen (1.5B, 7B, 14B, 32B) • DeepSeek-R1-Distill-Llama (8B and 70B) DeepSeek enables organizations to leverage advanced reasoning capabilities across multiple tasks.

rights reserved. 5 Core Capabilities • Advanced reasoning capabilities optimized for complex problem-solving (e.g. mathematics and coding tasks). • Outperforms on AIME 2024, MATH-500, and SWE-bench Verified. • Reportedly 90-95% more affordable than comparable models. • 671B Mixture of Experts (MoE) architecture, activation of 37B parameter. • DeepSeek-R1 requires at least 800 GB of HBM memory in FP8 format for inference.

rights reserved. 6 EC2 accelerated compute instances for AI/ML G6 (L4) P5 (H100) DL1 G6e (L40S) P4 (A100) P5e (H200) Inf1 Inf2 P5en (H200) Trn1 GPUs AI/ML accelerators and ASICs Trn2 G5 (A10G) AWS Trainium, Inferentia H100, H200, B200, GB200, A100, L40S, L4, A10G Cloud AI100 Standard Radeon GPU Xilinx accelerator Xilinx FPGA DL2q Gaudi accelerator Announced GB200 B200

rights reserved. 7 CPU CPU NSC EBS Host EFA PCIe SSD EFA SSD … Switching layer PCIe PCIe PCIe ML chip interconnect ML chip ML chip ML chip ML chip … Accelerators Accelerated compute architecture

rights reserved. 10 P5 instances Optimized for AI training and inference 900 GB/s NVSwitch for GPU peer-to-peer connections Scale-out with non-blocking interconnect Elastic Fabric Adapter (EFA) Instance GPU GPU memory CPU vCPU Instance memory Networking Local storage P5 8 NVIDIA H100 640 GB AMD Milan 192 2 TB 3200 Gbps EFAv2 30 TB SSD P5e 8 NVIDIA H200 1128 GB AMD Milan 192 2 TB 3200 Gbps EFAv2 30 TB SSD P5en 8 NVIDIA H200 1128 GB Intel SPR 192 2 TB 3200 Gbps EFAv3 30 TB SSD

rights reserved. 11 Bedrock Marketplace implementation • Bedrock Marketplace enables core DeepSeek-R1 deployment in managed endpoints • Complete code samples and step-by-step deployment guides provided for quick implementation • Standard Bedrock security and monitoring features

rights reserved. 12 Bedrock Marketplace delivers 100+ models from 30+ providers EVOLUTIONARY SCALE WIDN CAMB.AI GRETEL ARCEE AI PREFERRED NETWORKS WRITER UPSTAGE NCSOFT STOCKMARK KARAKURI JOHN SNOW LABS LIQUID DATABRICKS CYBERAGENT HUGGING FACE STABILITY AI LG AI RESEARCH M I S T R A L AI SNOWFLAKE N V I D I A DEEPSEEK

rights reserved. 13 Prerequisite: Increase your ml.p5e.48xlarge limits before deployment

rights reserved. 14 Step1: Find the DeepSeek-R1 model on the catalog

rights reserved. 15 Step2: Set options (ml.p5e.48xl by default) and deploy

rights reserved. 16 Step3: Playground or InvokeModel API

rights reserved. 17 Tips: Use proper chat template (model tokenizer) Example with DeepSeek-Distill-Llama-8B (via Bedrock CMI) 17 <｜begin▁of▁sentence｜><｜User｜>A man has 53 socks in his drawer: 21 identical blue, 15 identical black and 17 identical red. The lights are out, and he is completely in the dark. How many socks must he take out to make 100 percent certain he has at least one pair of black socks?<｜Assistant｜> When using Bedrock Playground, we must add proper chat template tags for optimal results. E.g.: When using InvokeModel API, we must configure proper tokenizer to apply the chat template. E.g.: tokenizer = AutoTokenizer.from_pretrained(hf_model_id) messages = [{"role": "user", "content": test_prompt}] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=not continuation) Bad quality output Good quality output

rights reserved. 18 DeepSeek-R1 Responsible AI concerns 18 (through the ApplyGuardrail API) can provide an extra layer of security and responsible AI measures

rights reserved. 19 Enterprise Protection • Enterprise-grade security features built-in • Complete data privacy when using AWS services • No data sharing with model providers • End-to-end encryption for all operations • Access controls and governance features • Compliance with AWS security standards

rights reserved. 20 Critical Concerns • Models hosted by AWS without any communication with DeepSeek servers or APIs • No customer data used to improve base models • Enterprise data protection capabilities • Privacy control through AWS services

rights reserved. 24 Model Options • Distilled models maintain most core capabilities while reducing latency and cost • Optimized for different computational and performance requirements • DeepSeek-R1-Distill-Llama offered in 8B and 70B versions • DeepSeek-R1-Distill-Qwen available in 1.5B, 7B, 14B, 32B variants (SageMaker AI only)

rights reserved. 25 Custom Model Import implementation • Bedrock Custom Model Import enables DeepSeek deployment • Support for Llama 8B and 70B distilled DeepSeek R1 variants • Complete code samples and step-by-step deployment guides provided for quick implementation • Standard Bedrock security and monitoring features • Pricing is on-demand in 5-minute window from first successful invocation • There is a cold-start and scaling up/down time

rights reserved. 43 Trn1/Trn2 instances Powered by AWS Trainium/Trainium2 custom ML chips Optimized for large-scale training distributed workloads Trn2 Ultraservers with extended NeuronLink for trillion-parameter AI Neuron Kernel Interface (NKI) for custom operators Instance Accelerators Accelerator memory vCPU Instance memory Networking trn1.32xlarge 16 512 GB 128 512 GB 800 Gbps EFAv2 trn1n.32xlarge 16 512 GB 128 512 GB 1600 Gbps EFAv2 trn2.48xlarge 16 1.5 TB 192 2 TB 3.2 Tbps EFAv3

rights reserved. 44 AWS Trainium architecture • Tensor engine are based on power-optimized systolic array • AWS Neuron SDK supports typical architecutres such as Llama

rights reserved. 45 Summary: DeepSeek-R1 deployment options on AWS 1. Amazon Bedrock Marketplace (Amazon SageMaker JumpStart) for the DeepSeek-R1 model 2. Amazon Bedrock Custom Model Import for the DeepSeek-R1-Distill models 3. Amazon EC2 Trn1 instances for the DeepSeek-R1-Distill models DeepSeek on AWS Blog ↑ https://aws.amazon.com/blogs/aws/deepseek-r1-models-now-available-on-aws/

rights reserved. 47 Further reading • DeepSeek • Anthropic CEO Dario Blog • https://darioamodei.com/on-deepseek-and-export-controls • Startup Customer Case Studies on AWS • Sakana AI • https://aws.amazon.com/startups/learn/letting-nature-lead-how-sakana-ai-is- transforming-model-building?lang=en-US • ELYZA (Llama2 Speculative Decoding on AWS Inferentia2 chip) • https://aws.amazon.com/jp/blogs/startup/tech-interview-elyza-2024/ • LLM Development on Trn1 • https://aws.amazon.com/jp/blogs/machine-learning/unlocking-japanese-llms- with-aws-trainium-innovators-showcase-from-the-aws-llm-development-support- program/

DeepSeek on AWS

DeepSeek on AWS

Yoshitaka Haribara

More Decks by Yoshitaka Haribara

Other Decks in Technology

Featured

Transcript

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All

© 2025, Amazon Web Services, Inc. or its affiliates. All