Title: Deploying and Scaling LLMs on AWS: A Practical Guide to Infrastructure Choices
Abstract:
Foundation Models and LLMs now range from billions to trillions of parameters, presenting unique deployment challenges. Organizations must navigate tradeoffs between cost, operational efficiency, and implementation complexity.
This talk provides a practical guide to deploying LLMs on AWS and hardware accelerators, including GPU and AI chips (AWS Trainium and Inferentia). Yoshitaka will offer step-by-step guidance for deployment using real-world examples like DeepSeek-R1 and its Distill variants and best practices for optimizing cost and performance at scale.
Bio:
Yoshitaka Haribara is a Sr. GenAI/Quantum Startup Solutions Architect at AWS and Visiting Associate Professor at Center for Quantum Information and Quantum Biology (QIQB), The University of Osaka. At AWS, he works with leading Japanese generative AI startups including Sakana AI, ELYZA, and Preferred Networks (PFN). He supports Japanese model providers to develop Japanese LLMs and list them (PFN, Stockmark, and Karakuri LLMs) on Amazon Bedrock Marketplace. With his background in combinatorial optimization with quantum optical devices, Yoshitaka also guides customers in leveraging Amazon Braket for quantum applications and works to bring Japanese quantum hardware to AWS. He holds a Ph.D. in Mathematical Informatics from The University of Tokyo and a B.S. in Mathematics from The University of Osaka.
Event page:
https://lu.ma/2whjtkdi?tk=xtQ5qR