Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RAG 아키텍처 - 개념부터 구현까지

RAG 아키텍처 - 개념부터 구현까지

Agenda
- Generative AI on AWS
- RAG Concepts
- Vector Embedding for Semantic Search
- Vector Databases in AWS
- Knowledge Bases for Amazon Bedrock

Sungmin Kim

March 21, 2024
Tweet

More Decks by Sungmin Kim

Other Decks in Programming

Transcript

  1. © 2024, Amazon Web Services, Inc. or its affiliates. ©

    2024, Amazon Web Services, Inc. or its affiliates. AWS Unicorn Day S E O U L | M A R C H 2 0 , 2 0 2 4
  2. © 2024, Amazon Web Services, Inc. or its affiliates. ©

    2024, Amazon Web Services, Inc. or its affiliates. RAG 아키텍처 개념부터 구현까지 Sungmin Kim Sr. AI/ML Specialist SA AWS
  3. © 2024, Amazon Web Services, Inc. or its affiliates. Agenda

    Generative AI on AWS RAG Concepts Vector Embedding for Semantic Search Vector Databases in AWS Knowledge Bases for Amazon Bedrock
  4. © 2024, Amazon Web Services, Inc. or its affiliates. Introduction

    to generative AI “A golden retriever wearing glasses and a hat in a portrait painting” > Question: What is generative AI? Generative AI is a branch of AI that focuses on creating new data. It is a subset of machine learning. The goal of generative AI is to create new data that is similar to the data that was used to train the model. Chat:
  5. © 2024, Amazon Web Services, Inc. or its affiliates. How

    does generative AI work? Foundation model Text input Output Text generation model (also known as large language model) Image generation model Video Audio Code generation model “Summarize this article …….” [Text] “………..” “a photo of an astronaut riding a horse on mars” “A young couple walking in rain.” “Children singing nature songs” “Write Python code to sort array …” [Image] [Video] [Audio] [Code]
  6. © 2024, Amazon Web Services, Inc. or its affiliates. Generative

    AI is powered by Foundation Models (FMs) Text generation Summarization Information extraction Q&A Chatbot Pretrain Adapt Tasks Unlabeled data Foundation model Text generation Summarization Information extraction Q&A Chatbot Train Deploy Tasks ML models … … … … Labeled data … … … …
  7. © 2024, Amazon Web Services, Inc. or its affiliates. Generative

    AI and Foundation Models Generative AI is a type of AI that can create new content and ideas, including conversations, stories, images, videos, and music. Like all AI, generative AI is powered by ML models—very large models that are pre-trained on vast amounts of data and commonly referred to as Foundation Models (FMs). NLP – Natural Language Processing CV – Computer Vision LLM – Large Language Model FM – Foundation Model LLM CV NLP NLP – Natural Language Processing CV – Computer Vision LLM – Large Language Model FM – Foundation Model FM
  8. © 2024, Amazon Web Services, Inc. or its affiliates. SageMaker

    Training and Inference How do I access Foundation Models on AWS? Amazon SageMaker JumpStart Amazon Bedrock Prompt / text embeddings API Layer SageMaker Endpoint Foundation Models SageMaker Jumpstart Model hub, deploy, fine-tune Accelerated Computing Trn1(n), Inf2, P4d, P5 Fine-tune API Layer Amazon Bedrock Foundation Models Prompt / text embeddings Fine-tune
  9. © 2024, Amazon Web Services, Inc. or its affiliates. ©

    2024, Amazon Web Services, Inc. or its affiliates. SageMaker JumpStart
  10. © 2024, Amazon Web Services, Inc. or its affiliates. Tasks

    Algorithms/models Vision ResNet, Inception, MobileNet, SSD, Faster RCNN, YOLO, and more Text Tabular Classification, regression, time-series LightGBM, CatBoost, AutoGluon, TabTransformer, XGBoost, DeepAR Audio Audio embedding TRILL, TRILLsson, TRILL-Distilled, FRILL FLAN T5 XL, AlexaTM, Stable Diffusion, BERT, RoBERTa, DistilBERT, Distillbart xsum, GPT2, ELECTRA, & More Image classification Image embedding Object detection Semantic segmentation Sentence classification Text classification Question answering Summarization Text generation, translation, Named-entity recognition SageMaker JumpStart: ML hub for SageMaker Customers 400+ algorithms and pre-trained, state-of-the-art, open-source models from PyTorch Hub, TensorFlow Hub, and Hugging Face, etc.
  11. © 2024, Amazon Web Services, Inc. or its affiliates. •

    Search for topics or problem types, and get relevant results across all content • Browse by content type to explore solutions, models, example notebooks, blogs, and video tutorials Browse and search SageMaker JumpStart content
  12. © 2024, Amazon Web Services, Inc. or its affiliates. Easy

    deploy experience • Inference instance type • Security Settings
  13. © 2024, Amazon Web Services, Inc. or its affiliates. Easy

    fine-tune experience • Labeled data set path • Training instance type • Hyper-parameters & Security settings
  14. © 2024, Amazon Web Services, Inc. or its affiliates. Three

    ways to use Foundation models through SageMaker JumpStart SageMaker Studio One-click deploy SageMaker Notebooks SageMaker Console Gated-preview
  15. © 2024, Amazon Web Services, Inc. or its affiliates. ©

    2024, Amazon Web Services, Inc. or its affiliates. Amazon Bedrock Single API to build with generative AI
  16. © 2024, Amazon Web Services, Inc. or its affiliates. Amazon

    Titan Summarization, image and text generation and search, Q&A Command & Embed Text generation, search, classification Jurassic-2 Contextual answers, summarization, paraphrasing Claude 3, Claude 2.1 & Claude Instant Multimodal reasoning, vision capabilities, translation, summarization, writing, coding Stable Diffusion XL 1.0 High-quality images and art Llama 2 Dialogue use cases and language tasks Mistral 7B & Mixtral 8x7B Text summarization, Q&A, Text classification, Text completion, code generation Mistral AI Amazon Bedrock supports leading foundation models
  17. © 2024, Amazon Web Services, Inc. or its affiliates. Single

    API to build with generative AI • Pass the model ID, type of content, and body of the request • Body includes the prompt and execution parameters • Returns model response and metadata • Handles text-to-text, text-to-image, image-to-image, and more • Supports current and future Amazon Titan models, third-party models, and even fine-tuned models
  18. © 2024, Amazon Web Services, Inc. or its affiliates. Bedrock

    core API: InvokeModel • Amazon Titan models • Third-party models • Fine-tuned models bedrock.invoke_model( modelId = model_id, contentType = "…", accept = "…", body = body)
  19. © 2024, Amazon Web Services, Inc. or its affiliates. Bedrock

    core API: Streaming responses response = bedrock.invoke_model_with_response_stream( modelId = model_id, body = body) stream = response.get('body') if stream: for event in stream: chunk = event.get('chunk’) if chunk: print(json.loads(chunk.get('bytes').decode())) Amazon Bedrock chunk 1 chunk N … invoke with stream 1 2 • Users can start reading the response as soon as the first chunk is available • For more information, see Amazon Bedrock Developer Guide
  20. © 2024, Amazon Web Services, Inc. or its affiliates. invoke_model()

    – Titan Embeddings “This is just some input I want to get embeddings for “ [0.752, 0.271, -0.325, 0.195, 0.837, -0.945] n-dimensional vector
  21. © 2024, Amazon Web Services, Inc. or its affiliates. Integrated

    with LangChain • New LLM and embeddings class for Amazon Bedrock • Includes code for using the LLM class in a conversation chain • Includes code for creating an embedding from text from langchain import Bedrock from langchain.embeddings import BedrockEmbeddings llm = Bedrock() print(llm(“what is generative AI?”)) pip install langchain
  22. © 2024, Amazon Web Services, Inc. or its affiliates. SageMaker

    Training and Inference Two ways to access Foundation Models on AWS Amazon SageMaker JumpStart • Machine learning (ML) hub with foundation models (public and proprietary) • Built-in algorithms, and prebuilt ML solutions • Deploy FM as SageMaker Endpoint (hosting) • Fine-tuning leverages SageMaker Training jobs • Choose SageMaker managed accelerated computing instance Prompt / text embeddings API Layer SageMaker Endpoint Foundation Models SageMaker Jumpstart Model hub, deploy, fine-tune Accelerated Computing Trn1(n), Inf2, P4d, P5 Fine-tune Amazon Bedrock • The easiest way to build and scale generative AI applications with foundation models (FMs) • Access directly or fine-tune foundation model using API • Serverless • Foundation model providers – Amazon, Anthropic, AI21, Stability, and many more. API Layer Amazon Bedrock Prompt / text embedding Fine-tune Foundation Models
  23. © 2024, Amazon Web Services, Inc. or its affiliates. We

    need to customize a Foundation Model, why? 조선왕조실록에 기록 된 세종대왕 맥북프로 던짐 사건에 대해 알려줘 Hallucination Current Knowledge 2024년 3월 20일 Amazon 주식의 종가를 알려줘 Closed-domain Knowledge 나는 Amazon 직원인데, 고객 문의에 대한 응대 매뉴얼을 알려줘 아쉽게도, 현재 제가 직접적인 매뉴얼을 제공할 수 없습니다. Amazon에 근무하는 경우, 내부 교육 자료, 팀 리더 또는 교육 담당자와 상담하여 정확한 정보와 지침을 얻을 수 있습니다. 세종대왕이 맥북 프로를 던지는 사건은 조선왕조실록에 기록되어 있습니다. 이 사건은 1443년(세종 25년) 세종이 집현전 학자들과 함께 궁중 후원에서 열린 연회에서 벌어진 일입니다. 세종은 연회 도중에 갑자기 맥북 프로를 바닥에 던져버렸습니다. 죄송합니다만, 제 현재 정보는 2022년 3월 기준이며 최신 주식 가격 정보는 제가 알 수 없습니다. 최신 정보를 확인하려면 실시간 주식 시세 사이트나 금융 뉴스를 참고하시길 바랍니다.
  24. © 2024, Amazon Web Services, Inc. or its affiliates. We

    need to customize a Foundation Model, why? • Specific Task • Closed-domain knowledge • Current Knowledge • Improving the performance/quality • Reduce likelihood of hallucinations Foundation Model Common Crawl Wikipedia Initial Pretraining Prompt (Question) Response (Answer) Which FMs are supported by Amazon Bedrock? Radio X FM & TV
  25. © 2024, Amazon Web Services, Inc. or its affiliates. Prompt

    Engineering, new way of using ML! Task Description Example Input 1 Example Input 2 Example Input 3 Output Indicator Generated Response Input Output Prompt LLM
  26. © 2024, Amazon Web Services, Inc. or its affiliates. Movie

    review sentiment classifier. Review: "I loved this movie!" This review is positive. Review: "I am not sure, I think the movie was fine.” This review is neutral. Review: "This movie was a waste of time and money" This review is negative. Review: "I really had fun watching this movie” This review is Positive Input Output Model Task Description Examples Output indicator Prompt Engineering, new way of using ML!
  27. © 2024, Amazon Web Services, Inc. or its affiliates. How

    to do Prompt Augmentation? 맥북 프로(MacBook Pro)는 애플 매킨토시 노트북 컴퓨터 라인업으로, 2006년 1월 10일 맥월드 엑스포에서 처음 공개되었다. 세종은 조선의 제4대 왕이다. 재위 기간은 1418~1450년이며, 1418년 6월에 왕세자에 책봉되었다가 8월에 태종의 양위를 받아 즉위했다. Prompt Context Generated Response
  28. © 2024, Amazon Web Services, Inc. or its affiliates. Retrieval-Augmented

    Generation (RAG) with LLMs Prompt + Query Search Relevant Information Knowledge Sources Query LLM Endpoint Generated Text Response 1 2 Relevant Information for Enhanced Context 3 5 4 Enhanced Context Query Prompt + +
  29. © 2024, Amazon Web Services, Inc. or its affiliates. How

    does RAG work? User Query: HDL 콜레스테롤이 뭐지? Search Relevant Information Medical Knowledge Sources Relevant Information for Enhanced Context: HDL 콜레스테롤은 쉽게 얘기하면 좋은 콜레스테롤 이라고 말할 수 있습니다. HDL 콜레스테롤 수치가 낮다면 심혈관계 위험인자가 되기 때문에 HDL 콜레스테롤 수치를 높여야 합니다. Generated Text Response: HDL 콜레스테롤은 혈액 속 남아있는 콜레스테롤을 간으로 운반하여 배설되게 하기 때문에, 혈관을 청소하는 ‘좋은 콜레스테롤'로 불려요. 1 3 4 5 2 Query: HDL 콜레스테롤이 뭐지? Enhanced Context: HDL 콜레스테롤은 쉽게 얘기하면 좋은 콜레스테롤 이라고 말할 수 있습니다. HDL 콜레스테롤 수치가 낮다면 심혈관계 위험인자가 되기 때문에 HDL 콜레스테롤 수치를 높여야 합니다. API Layer Foundation Models
  30. © 2024, Amazon Web Services, Inc. or its affiliates. Search

    Relevant Information for enhanced Context
  31. © 2024, Amazon Web Services, Inc. or its affiliates. Golf

    Shoes Bag Golf Shoes Bag Golf Shoes Bag Golf Shoe Case Golf Shoes Stretcher for Golf Shoes Plastic Golf Tees. Golf Shoes Golf Shoes Golf Shoes Golf Shoes golf Golf Shoes bright color golf shoes Improving Search Results with Semantic search Without semantic search With semantic search
  32. © 2024, Amazon Web Services, Inc. or its affiliates. How

    to capture Semantic Relationships between Words? Male-Female Verb tense King Queen Man Woman walking walked swiming swam Spain Madrid Korea Seoul Japan Tokyo China Beijing Russia Moscow U.S.A Washington D.C Country-Capital
  33. © 2024, Amazon Web Services, Inc. or its affiliates. Words

    as Vectors “I am at AWS Unicorn Day 2024“ [0.743, 0.720, -0.325, 0.195, 0.835, -0.945] n-dimensional vector
  34. © 2024, Amazon Web Services, Inc. or its affiliates. What

    is a vector embedding? • A numerical representation of words or sentences, used in NLP • NLP models can easily perform tasks such as querying, classification, and applying machine learning algorithms on textual data “I am at AWS Unicorn Day 2024“ [0.743, 0.720, -0.325, 0.195, 0.835, -0.945] n-dimensional vector Magnitude Direction
  35. © 2024, Amazon Web Services, Inc. or its affiliates. Vector

    similarity search Similar vectors = similar meaning and context Deliver similarity search results based on vector proximity by comparing vector distances Your favorite song Song with a high chance of being liked Amazon Music: Recommendations
  36. © 2024, Amazon Web Services, Inc. or its affiliates. Euclidean

    (L2) Useful for counts/measurements recommendation systems Dot product Useful for collaborative filtering Cosine similarity Useful for semantic search and document classification Similarity Metrics for Vector search
  37. © 2024, Amazon Web Services, Inc. or its affiliates. Generative

    AI models represent unstructured data as vector embeddings Text Image Embedding model Vector embeddings Audio Unstructured data
  38. © 2024, Amazon Web Services, Inc. or its affiliates. Vector

    embedding for Semantic search ( Vector search) Text Image Embedding model Vector embeddings Audio Unstructured data Search space Semantic search ≈
  39. © 2024, Amazon Web Services, Inc. or its affiliates. Vector

    Database as a Semantic Search Engine Text Image Embedding model Vector embeddings Vector database Audio Unstructured data Search space
  40. © 2024, Amazon Web Services, Inc. or its affiliates. RAG

    using Vector Database D A T A I N G E S T I O N W O R K F L O W Prompt + Query Search Relevant Information Knowledge Sources Query LLM Endpoint Generated Text Response 1 2 Relevant Information for Enhanced Context 3 5 4 Enhanced Context Query Prompt + +
  41. © 2024, Amazon Web Services, Inc. or its affiliates. RAG

    using Vector Database D A T A I N G E S T I O N W O R K F L O W
  42. © 2024, Amazon Web Services, Inc. or its affiliates. invoke_model()

    – Titan Embeddings “This is just some input I want to get embeddings for “ [0.752, 0.271, -0.325, 0.195, 0.837, -0.945] n-dimensional vector Amazon Bedrock
  43. © 2024, Amazon Web Services, Inc. or its affiliates. Document

    Chunking Strategies for RAG D A T A I N G E S T I O N W O R K F L O W Vector embeddings Document Chunks Embeddings model Document Vector store
  44. © 2024, Amazon Web Services, Inc. or its affiliates. Document

    Chunking Strategies for RAG Amazon Bedrock 및 Amazon Titan을 통해 제공될 예정인 생성형 AI(GenAI; Generative AI)와 대규모 언어 모델(LLM; Large Language Model)은 개발자와 기업 이 자연어 처리 및 이해와 관련된 전통적으 로 복잡했던 문제를 해결하는 방식을 변화 시키고 있습니다. LLM이 제공하는 몇 가지 이점으로는 고객 서비스 애플리케이션에서 더 강력하고 매력적인 대화형 AI 환경을 구 축하고, 더 직관적이고 정확한 응답 결과를 통해 직원 생산성을 향상시킨다는 것입니 다. 그러나 이러한 유스케이스를 위해서는 대 화형 경험을 구현하는 GenAI 애플리케이션 이 회사 데이터에 대한 응답을 제한하여 모 델 환각(hallucinations, 잘못된 진술)을 완 화하고 최종 사용자의 콘텐츠 액세스 권한 에 따라 응답 결과를 필터링하는 두 가지 주 요 기준을 충족하는 것이 중요합니다. Sentence splitting Amazon Bedrock 및 Amazon Titan을 통해 제공될 예정인 생성형 AI(GenAI; Generative AI)와 대규모 언어 모델(LLM; Large Language Model)은 개발자와 기업 이 자연어 처리 및 이해와 관련된 전통적으 로 복잡했던 문제를 해결하는 방식을 변화 시키고 있습니다. LLM이 제공하는 몇 가지 이점으로는 고객 서비스 애플리케이션에서 더 강력하고 매력적인 대화형 AI 환경을 구 축하고, 더 직관적이고 정확한 응답 결과를 통해 직원 생산성을 향상시킨다는 것입니 다. 그러나 이러한 유스케이스를 위해서는 대 화형 경험을 구현하는 GenAI 애플리케이션 이 회사 데이터에 대한 응답을 제한하여 모 델 환각(hallucinations, 잘못된 진술)을 완 화하고 최종 사용자의 콘텐츠 액세스 권한 에 따라 응답 결과를 필터링하는 두 가지 주 요 기준을 충족하는 것이 중요합니다. Page splitting Page overlapping Amazon Bedrock 및 Amazon Titan을 통해 제공될 예정인 생성형 AI(GenAI; Generative AI)와 대규모 언어 모델(LLM; Large Language Model)은 개발자와 기업 이 자연어 처리 및 이해와 관련된 전통적으 로 복잡했던 문제를 해결하는 방식을 변화 시키고 있습니다. LLM이 제공하는 몇 가지 이점으로는 고객 서비스 애플리케이션에서 더 강력하고 매력적인 대화형 AI 환경을 구 축하고, 더 직관적이고 정확한 응답 결과를 통해 직원 생산성을 향상시킨다는 것입니 다. 그러나 이러한 유스케이스를 위해서는 대 화형 경험을 구현하는 GenAI 애플리케이션 이 회사 데이터에 대한 응답을 제한하여 모 델 환각(hallucinations, 잘못된 진술)을 완 화하고 최종 사용자의 콘텐츠 액세스 권한 에 따라 응답 결과를 필터링하는 두 가지 주 요 기준을 충족하는 것이 중요합니다.
  45. © 2024, Amazon Web Services, Inc. or its affiliates. Vector

    Databases for RAG in AWS Amazon OpenSearch Service Amazon DocumentDB (with MongoDB compatibility) Amazon Aurora PostgreSQL Vector Search Amazon RDS for PostgreSQL Amazon MemoryDB for Redis Amazon OpenSearch Serverless
  46. © 2024, Amazon Web Services, Inc. or its affiliates. Knowledge

    Databases for RAG in AWS Amazon Kendra Amazon OpenSearch Service Amazon DocumentDB (with MongoDB compatibility) Amazon Aurora PostgreSQL Full-text Search Vector Search Amazon RDS for PostgreSQL Amazon MemoryDB for Redis Amazon OpenSearch Serverless Hybrid Search
  47. © 2024, Amazon Web Services, Inc. or its affiliates. ©

    2024, Amazon Web Services, Inc. or its affiliates. RAG Implementation Patterns
  48. © 2024, Amazon Web Services, Inc. or its affiliates. Retrieval-Augmented

    Generation (RAG) with LLMs Prompt + Query Search Relevant Information Knowledge Sources Query LLM Endpoint Generated Text Response 1 2 Relevant Information for Enhanced Context 3 5 4 Enhanced Context Query Prompt + +
  49. © 2024, Amazon Web Services, Inc. or its affiliates. Knowledge

    Databases for RAG in AWS Amazon Kendra Amazon OpenSearch Service Amazon DocumentDB (with MongoDB compatibility) Amazon Aurora PostgreSQL Full-text Search Vector Search Amazon RDS for PostgreSQL Amazon MemoryDB for Redis Amazon OpenSearch Serverless Hybrid Search
  50. © 2024, Amazon Web Services, Inc. or its affiliates. Amazon

    Bedrock Amazon Kendra Full-text Search – Amazon Kendra Generative AI Application User Knowledge Sources 1 2 3 5 4 Prompt + Query 1 Query 2 Relevant Information for Enhanced Context 3 Generated Text Response 5 4 Prompt + Query + Enhanced Context (Text Generation) 5
  51. © 2024, Amazon Web Services, Inc. or its affiliates. Amazon

    Bedrock Generative AI Application User 1 2 3 5 4 Prompt + Query 1 Query 2 Relevant Information for Enhanced Context 3 Generated Text Response 5 4 Prompt + Query + Enhanced Context Amazon OpenSearch Service Full-text Search – Amazon OpenSearch (Text Generation) Knowledge Sources 5
  52. © 2024, Amazon Web Services, Inc. or its affiliates. Knowledge

    Databases for RAG in AWS Amazon Kendra Amazon OpenSearch Service Amazon DocumentDB (with MongoDB compatibility) Amazon Aurora PostgreSQL Full-text Search Vector Search Amazon RDS for PostgreSQL Amazon MemoryDB for Redis Amazon OpenSearch Serverless Hybrid Search
  53. © 2024, Amazon Web Services, Inc. or its affiliates. Amazon

    Aurora PostgreSQL Vector Search – Amazon Aurora PostgreSQL Generative AI Application User 1 3 4 6 5 Prompt + Query 1 Query 2 Relevant Information for Enhanced Context 4 Generated Text Response 6 5 Prompt + Query + Enhanced Context Amazon Bedrock (Text Generation) Amazon Bedrock (Vector Embedding) Knowledge Sources Vector embeddings 2 3 Embedding for Query 3 Document Chunks 6
  54. © 2024, Amazon Web Services, Inc. or its affiliates. Knowledge

    Databases for RAG in AWS Amazon DocumentDB (with MongoDB compatibility) Amazon Aurora PostgreSQL Full-text Search Vector Search Amazon RDS for PostgreSQL Amazon MemoryDB for Redis Hybrid Search Amazon Kendra Amazon OpenSearch Service Amazon OpenSearch Serverless
  55. © 2024, Amazon Web Services, Inc. or its affiliates. Amazon

    OpenSearch Service Hybrid (= Full-text + Vector) Search – Amazon OpenSearch Generative AI Application User 1 3 4 6 5 Prompt + Query 1 Query 2 Relevant Information for Enhanced Context 4 Generated Text Response 6 5 Prompt + Query + Enhanced Context Amazon Bedrock (Text Generation) Amazon Bedrock (Vector Embedding) Vector embeddings 2 3 Embedding for Query 3 Document Chunks Knowledge Sources 6
  56. © 2024, Amazon Web Services, Inc. or its affiliates. Amazon

    OpenSearch - Hybrid Search Full-text search Vector search Hybrid • Pre-filtering, post-filtering, and efficient filtering options • Compound queries with various scoring and boosting options • Supports up to 1000 fields with various data types, including text, numeric, Boolean, geo-location { "query": { "script_score": { "query": { "bool": { "filter": { "range": { "price": { "lt": 8 } } } } }, "script": { "lang": "knn", "source": "knn_score", "params": { "field": "my_vector", "query_value": [ 9.9, 9.9 ], "space_type": "l2" } } } } } Full-text Search Vector Search
  57. © 2024, Amazon Web Services, Inc. or its affiliates. Retrieval-Augmented

    Generation (RAG) with LLMs Prompt + Query Search Relevant Information Knowledge Sources Query LLM Endpoint Generated Text Response 1 2 Relevant Information for Enhanced Context 3 5 4 Enhanced Context Query Prompt + +
  58. © 2024, Amazon Web Services, Inc. or its affiliates. SageMaker

    Training and Inference Two ways to access Foundation Models on AWS Amazon SageMaker JumpStart Amazon Bedrock Prompt / text embeddings API Layer SageMaker Endpoint Foundation Models SageMaker Jumpstart Model hub, deploy, fine-tune Accelerated Computing Trn1(n), Inf2, P4d, P5 Fine-tune API Layer Amazon Bedrock Foundation Models Prompt / text embeddings Fine-tune
  59. © 2024, Amazon Web Services, Inc. or its affiliates. Amazon

    Bedrock Amazon Kendra Full-text Search – Amazon Kendra + SageMaker Generative AI Application User Knowledge Sources 1 2 3 5 4 Prompt + Query 1 Query 2 Relevant Information for Enhanced Context 3 Generated Text Response 5 4 Prompt + Query + Enhanced Context (Text Generation) Amazon SageMaker 5
  60. © 2024, Amazon Web Services, Inc. or its affiliates. Vector

    Search – Amazon Aurora PostgreSQL + SageMaker Generative AI Application User 1 3 4 6 5 Prompt + Query 1 Query 2 Relevant Information for Enhanced Context 4 Generated Text Response 6 5 Prompt + Query + Enhanced Context Amazon Bedrock (Text Generation) Amazon Bedrock (Vector Embedding) Vector embeddings 2 3 Embedding for Query 3 Document Chunks Amazon SageMaker Amazon SageMaker Knowledge Sources Amazon Aurora PostgreSQL 6
  61. © 2024, Amazon Web Services, Inc. or its affiliates. Amazon

    OpenSearch Service Hybrid Search – Amazon OpenSearch + SageMaker Generative AI Application User 1 3 4 6 5 Prompt + Query 1 Query 2 Relevant Information for Enhanced Context 4 Generated Text Response 6 5 Prompt + Query + Enhanced Context Amazon Bedrock (Text Generation) Amazon Bedrock (Vector Embedding) Vector embeddings 2 3 Embedding for Query 3 Document Chunks Amazon SageMaker Amazon SageMaker Knowledge Sources 6
  62. © 2024, Amazon Web Services, Inc. or its affiliates. DEMO:

    Generative AI application with RAG approach Amazon Kendra Index with Data Ingested User 1 6 Generative AI Application Prompt + Query 1 Query 2 Relevant Information for Enhanced Context 3 Generated Text Response 5 4 Prompt + Query + Enhanced Context Response from LLM to User 6 5 4 Amazon SageMaker (LLM for Text Generation) 2 3 (source: https://tinyurl.com/y3b7fvfh)
  63. © 2024, Amazon Web Services, Inc. or its affiliates. However,

    when it comes to implementing RAG, there are challenges… Creating vector embeddings for large volumes of data Orchestration Managing multiple data sources Scaling retrieval mechanism Coding effort Incremental updates to vector store
  64. © 2024, Amazon Web Services, Inc. or its affiliates. ©

    2024, Amazon Web Services, Inc. or its affiliates. Knowledge Bases for Amazon Bedrock
  65. © 2024, Amazon Web Services, Inc. or its affiliates. A

    N S W E R Q U E S T I O N V E C T O R D A T A B A S E B E D R O C K F O U N D A T I O N M O D E L Knowledge bases for Amazon Bedrock U S E R E T R I E V A L A U G M E N T E D G E N E R A T I O N ( R A G ) Connect FMs to data sources including vector engine for Amazon OpenSearch Serverless, Amazon Aurora PostgreSQL, Pinecone, and Redis Enterprise Cloud Enable automatic data source detection Easily retrieve relevant data and augment prompt Provide citations
  66. © 2024, Amazon Web Services, Inc. or its affiliates. Large

    Language Model Prompt augmentation Response Embeddings model Data source Vector store Embeddings model Embedding User User Input Context -0.02 0.89 -0.38 -0.53 0.95 0.17 Text Generation Workflow Data Ingestion Workflow Semantic search Document chunks End-to-End RAG Workflow
  67. © 2024, Amazon Web Services, Inc. or its affiliates. RAG

    using Vector Database D A T A I N G E S T I O N W O R K F L O W Vector embeddings Document Chunks Embeddings model Data source Vector store
  68. © 2024, Amazon Web Services, Inc. or its affiliates. RAG

    using Knowledge Bases for Amazon Bedrock D A T A I N G E S T I O N W O R K F L O W Vector embeddings Document Chunks Embeddings model Data source Vector store Knowledge Bases for Amazon Bedrock
  69. © 2024, Amazon Web Services, Inc. or its affiliates. Data

    Ingestion Workflow K N O W L E D G E B A S E S F O R A M A Z O N B E D R O C K • Choose your data source (Amazon S3) • Support for incremental updates • Multiple data file formats supported • Choose your chunking strategy • Fixed chunks • No chunking • Default (200 tokens) • Choose your embedding model • Amazon Titan • Cohere • Choose your vector store • Open Search Serverless • Amazon Aurora PostgreSQL • Pinecone • Redis Enterprise Embeddings model New data Fully managed data ingestion workflow Document chunks Data source Vector store
  70. © 2024, Amazon Web Services, Inc. or its affiliates. Fully

    managed data ingestion K N O W L E D G E B A S E S F O R A M A Z O N B E D R O C K • Choose your data source (Amazon S3) • Support for incremental updates • Multiple data file formats supported • Choose your chunking strategy • Fixed chunks • No chunking • Default (200 tokens) • Choose your embedding model • Amazon Titan • Cohere • Choose your vector store • Open Search Serverless • Amazon Aurora • Pinecone • Redis Enterprise Embeddings model New data Fully managed data ingestion workflow Document chunks Data source Vector store Automated and fully managed data ingestion using Knowledge Bases for Amazon Bedrock
  71. © 2024, Amazon Web Services, Inc. or its affiliates. Retrieval

    and Generate Large Language Model Prompt augmentation Response Knowledge Bases for Amazon Bedrock Embeddings model Embedding User User Input Context -0.02 0.89 -0.38 -0.53 0.95 0.17 Text Generation Workflow Semantic search K N O W L E D G E B A S E S F O R A M A Z O N B E D R O C K
  72. © 2024, Amazon Web Services, Inc. or its affiliates. RetrieveAndGenerate

    API K N O W L E D G E B A S E S F O R A M A Z O N B E D R O C K Response User User Input RetrieveAndGenerate API Generate query embedding Retrieve similar documents from knowledge bases Augment query with retrieved documents Generate response from LLM User query Generated response Fully managed RAG
  73. © 2024, Amazon Web Services, Inc. or its affiliates. Customize

    RAG workflows using Retrieve API K N O W L E D G E B A S E S F O R A M A Z O N B E D R O C K Large Language Model Prompt augmentation Response User User Input Context Customized RAG workflow Retrieve API Generate query embedding Retrieve similar documents from knowledge bases User query Retrieved documents
  74. © 2024, Amazon Web Services, Inc. or its affiliates. Amazon

    OpenSearch Serverless RAG using Amazon Bedrock and OpenSearch Serverless Generative AI Application User 1 3 4 6 5 Prompt + Query 1 Query 2 Relevant Information for Enhanced Context 4 Generated Text Response 6 5 Prompt + Query + Enhanced Context Amazon Bedrock (Text Generation) Amazon Bedrock (Vector Embedding) Vector embeddings 2 3 Embedding for Query 3 Document Chunks Knowledge Sources 6
  75. © 2024, Amazon Web Services, Inc. or its affiliates. Knowledge

    Bases for Amazon Bedrock Amazon OpenSearch Serverless RAG using Knowledge Bases for Amazon Bedrock User 1 3 4 6 5 Prompt + Query 1 Query 2 Relevant Information for Enhanced Context 4 Generated Text Response 6 5 Prompt + Query + Enhanced Context Amazon Bedrock (Text Generation) Amazon Bedrock (Vector Embedding) 2 3 Embedding for Query 3 2 6 6 Generative AI Application
  76. © 2024, Amazon Web Services, Inc. or its affiliates. ©

    2024, Amazon Web Services, Inc. or its affiliates. Knowledge Bases integration with LangChain
  77. © 2024, Amazon Web Services, Inc. or its affiliates. from

    langchain import Bedrock from langchain.retrievers.bedrock import AmazonKnowledgeBasesRetriever from langchain.chains import RetrievalQA bedrock_client = boto3.client(“bedrock-runtime”) model_kwargs_claude = {“temperature”: 0, “top_k”: 10, “max_tokens_to_sample”: 3000} llm = Bedrock(model_id=“anthropic.claude-instant-v1”, model_kwargs=model_kwargs_claude, client=bedrock_client) retriever = AmazonKnowledgeBaseRetriever(knowledge_base_id=kb_id, retrieval_config={“vectorSearchConfiguration”: {“numberOfResults”: 3}}) qa = RetrievalQA.from_chain_type(llm=llm, chain_type=“stuff”, retriever=retriever, return_source_documents=True, chain_type_kwargs={“prompt”: claude_prompt}) answer = qa(query) print(answer)
  78. © 2024, Amazon Web Services, Inc. or its affiliates. from

    langchain import Bedrock from langchain.retrievers.bedrock import AmazonKnowledgeBasesRetriever from langchain.chains import RetrievalQA bedrock_client = boto3.client(“bedrock-runtime”) model_kwargs_claude = {“temperature”: 0, “top_k”: 10, “max_tokens_to_sample”: 3000} llm = Bedrock(model_id=“anthropic.claude-instant-v1”, model_kwargs=model_kwargs_claude, client=bedrock_client) retriever = AmazonKnowledgeBaseRetriever(knowledge_base_id=kb_id, retrieval_config={“vectorSearchConfiguration”: {“numberOfResults”: 3}}) qa = RetrievalQA.from_chain_type(llm=llm, chain_type=“stuff”, retriever=retriever, return_source_documents=True, chain_type_kwargs={“prompt”: claude_prompt}) answer = qa(query) print(answer)
  79. © 2024, Amazon Web Services, Inc. or its affiliates. ©

    2024, Amazon Web Services, Inc. or its affiliates. Demo - Knowledge Bases for Amazon Bedrock
  80. © 2024, Amazon Web Services, Inc. or its affiliates. Demo

    83 Demo - Knowledge Bases for Amazon Bedrock
  81. © 2024, Amazon Web Services, Inc. or its affiliates. RAG

    use cases Improved content quality E.g., helps in reducing hallucinations and connecting with recent knowledge including enterprise data Contextual chatbots and question answering E.g., enhance chatbot capabilities by integrating with real-time data Personalized search E.g., searching based on user previous search history and persona Real-time data summarization E.g., retrieving and summarizing transactional data from databases, or API calls
  82. © 2024, Amazon Web Services, Inc. or its affiliates. Recap

    Knowledge Bases for Amazon Bedrock RAG concepts Vector Database A N S W E R Q U E S T I O N V E C T O R D A T A B A S E B E D R O C K F O U N D A T I O N M O D E L Amazon SageMaker Amazon Bedrock Gen AI
  83. © 2024, Amazon Web Services, Inc. or its affiliates. Resources

    [1] Generative AI on Amazon SageMaker Workshop https://catalog.us-east-1.prod.workshops.aws/workshops/972fd252-36e5-4eed-8608-743e84957f8e/en-US [2] Building with Amazon Bedrock and LangChain Workshop https://catalog.workshops.aws/building-with-amazon-bedrock/en-US [3] Building a Chatbot using Amazon Bedrock and Kendra Workshop https://catalog.us-east-1.prod.workshops.aws/workshops/27eb3134-4f33-4689-bb73-269e4273947a/en-US [4] RAG Architecture Recipes with sample codes https://www.awsstartup.io/startup-recipe/architecture-recipes#generative-ai
  84. © 2024, Amazon Web Services, Inc. or its affiliates. ©

    2024, Amazon Web Services, Inc. or its affiliates. 여러분의 소중한 피드백을 기다립니다. 강연 종료 후, 강연 평가에 참여해주세요!
  85. © 2024, Amazon Web Services, Inc. or its affiliates. ©

    2024, Amazon Web Services, Inc. or its affiliates. 감사합니다