Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Ultimate RAG Showdown

moritalous
August 24, 2024
77

The Ultimate RAG Showdown

moritalous

August 24, 2024
Tweet

More Decks by moritalous

Transcript

  1. https://my.prairie.cards/u/moritalous Self-introduction Name: Morita Kazuaki AWS Ambassador (2023-) AWS Top

    Engineer(2020-) AWS All Certifications Engineer (2024) AWS Community Builder (2024) X / Qiita / GitHub : @moritalous 2 「Jumping deer with japanese temple」 Created by Amazon Titan Image Generator
  2. https://my.prairie.cards/u/moritalous What is RAG? • RAG (Retrieval-Augmented Generation) is a

    technique that provides external information to generative AI to produce answers. • It helps prevent generative AI from producing “hallucinations”. 4
  3. https://my.prairie.cards/u/moritalous What is Knowledge bases for Amazon Bedrock? 9 •

    Bedrock capabilities for building RAG • Can be built using only the management console • Feature updates are also active.
  4. https://my.prairie.cards/u/moritalous Knowledge bases for Amazon Bedrock Knowledge bases for Amazon

    Bedrock architecture 10 Embeddings Embeddings Answer Generation Text Extraction Chunk Split OpenSearch Serverless retrieval S3 Question Answer
  5. https://my.prairie.cards/u/moritalous retrieve and generate answer with a single API call

    def retrieve_and_generate(question: str): response = client.retrieve_and_generate( input={"text": question}, retrieveAndGenerateConfiguration={ "knowledgeBaseConfiguration": { "knowledgeBaseId": knowledgeBaseId, "modelArn": modelArn, "orchestrationConfiguration": { "queryTransformationConfiguration": {"type": "QUERY_DECOMPOSITION"} }, "retrievalConfiguration": { "vectorSearchConfiguration": {"overrideSearchType": "HYBRID"} }, }, "type": "KNOWLEDGE_BASE", }, ) return response 11 Retrieve and generate answer with a single API call (there is also an API that only performs retrieve)
  6. https://my.prairie.cards/u/moritalous Scoring of KB4AB • Ease of environment construction: ☆☆☆

    Can be constructed simply by operating the management console There is also a "Quick create" feature that automatically creates OpenSearch Serverless. • Abundant features: ☆☆ There are frequent feature updates, and recently a feature to build Advanced RAG has been added. You can easily apply selected RAG optimization methods. • Extensibility: ☆ Even when new methods or new LLMs emerge, they cannot be used immediately. • Japanese Support: ☆ The OpenSearch Serverless index created by the quick creation function does not include settings for Japanese. 12
  7. https://my.prairie.cards/u/moritalous The Ultimate RAG Showdown 13 KB4AB Ease of environment

    construction ☆☆☆ Abundant features ☆☆ Extensibility ☆ Japanese Support ☆
  8. https://my.prairie.cards/u/moritalous What is Kendra? 15 • Managed enterprise search service

    • A wide range of data source connectors are available • Supports not only document searches but also FAQ-style searches
  9. https://my.prairie.cards/u/moritalous GenAI App KendRAG architecture Answer Generation Retrieval Generate search

    query 16 Bedrock Bedrock Text Extraction Chunk Split Kendra S3 Question Answer
  10. https://my.prairie.cards/u/moritalous Process 1) Generate search query def generate_search_query(question: str): result

    = bedrock_runtime.converse( modelId="cohere.command-r-plus-v1:0", additionalModelRequestFields={"search_queries_only": True}, additionalModelResponseFieldPaths=["/search_queries"], messages=[ { "role": "user", "content": [{"text": question}], } ], ) return list( map( lambda x: x["text"], result["additionalModelResponseFields"]["search_queries"], ) ) 17 A process to create a query from a user's question before searching. A function provided by Cohere Command R/R+ Example: "Which regions have Kendra and Bedrock with Claude 3.5?" - Regions where Kendra is provided - Regions where Bedrock has Claude 3.5
  11. https://my.prairie.cards/u/moritalous Process 2) Retrieval def fetching_relevant_documents(queries: list[str]): items = []

    for query in queries: response = kendra.retrieve( IndexId=kendra_index_id, QueryText=query, AttributeFilter={ "EqualsTo": {"Key": "_language_code", "Value": {"StringValue": "ja"}} }, ) items.extend( list( map( lambda x: {k: v for k, v in x.items() if k in ["Id", "DocumentId", "DocumentTitle", "Content", "DocumentURI"]}, response["ResultItems"], ) ) ) return items 18 fetching relevant documents from Kendra
  12. https://my.prairie.cards/u/moritalous Process 3) Answer Generation def generating_response(question: str, documents: list[str]):

    result = bedrock_runtime.converse( modelId="cohere.command-r-plus-v1:0", additionalModelRequestFields={"documents": documents}, messages=[ { "role": "user", "content": [{"text": question}], } ], ) return result["output"]["message"]["content"][0]["text"] 19 Processing to generate answers in Bedrock Works well with the Cohere Command R API
  13. https://my.prairie.cards/u/moritalous Scoring of KendRAG • Ease of environment construction: ☆☆

    It is necessary to build a generative AI app, but since there are many generative AI frameworks such as LangChain, it is not that difficult. • Abundant features: ☆ You can use the search function provided by Kendra. The connection with the generation AI needs to be developed. • Extensibility: ☆☆☆ It is possible to try and incorporate various RAG accuracy improvement techniques. It is also easy to change the generative AI and search database. • Japanese Support: ☆☆ Kendra officially supports Japanese 20
  14. https://my.prairie.cards/u/moritalous The Ultimate RAG Showdown 21 KB4AB KendRAG Ease of

    environment construction ☆☆☆ ☆☆ Abundant features ☆☆ ☆ Extensibility ☆ ☆☆☆ Japanese Support ☆ ☆☆
  15. https://my.prairie.cards/u/moritalous • A service that provides the open source OpenSearch

    managed by AWS • Actively adding features that can be used for RAG ◦ Vector search, Neural search, Hybrid search ◦ Integration with external AI models such as Bedrock and SageMaker ◦ Text chunking ◦ Reranking ◦ Conversational search, RAG What is OpenSearch Service? 23
  16. https://my.prairie.cards/u/moritalous Search pipeline OpenSearch Service Ingest pipeline OpenSearchRAG architecture 24

    Bedrock Text Extraction Data Source Question Answer Bedrock SageMaker Bedrock Embeddings Retrieval Reranking Answer Generation Chunk Split Embeddings
  17. https://my.prairie.cards/u/moritalous OpenSearch’s search API def search(query: str): response = client.search(

    index=index_name, body={ "_source": {"exclude": ["body_chunk_embedding"]}, "query": { "hybrid": { "queries": [ {"match": {"body_chunk": {"query": query,}}}, {"nested": { "score_mode": "max", "path": "body_chunk_embedding", "query": { "neural": { "body_chunk_embedding.knn": { "query_text": query, "model_id": titan_model_id, }}},}},],}}, "ext": { "rerank": {"query_context": {"query_text": query,},}, "generative_qa_parameters": { "llm_model": "litellm", "llm_question": query, "context_size": 4, },},}, params={"search_pipeline": "hybrid-rerank-search-pipeline"}, ) 25 context = list(map(lambda x: x["_source"], response["hits"]["hits"])) for tmp in context: del tmp["body_chunk"] return { "answer": response["ext"]["retrieval_augmented_generation"]["answer"], "context": context, } By defining a search pipeline, you can get RAG results just by calling the search API.
  18. https://my.prairie.cards/u/moritalous Scoring of OpenSearchRAG • Ease of environment construction: ☆

    Constructed by combining various OpenSearch Service functions The OpenSearch documentation only explains individual functions, so construction is difficult • Abundant features: ☆☆ Actively expanding functions with RAG in mind, allowing hybrid search, reranking, chunk splitting, etc. • Extensibility: ☆ Realized within the range of functions supported by OpenSearch • Japanese Support: ☆☆ Searches tailored to Japanese can be performed using the kuromoji and Sudachi plugins 26
  19. https://my.prairie.cards/u/moritalous The Ultimate RAG Showdown 27 KB4AB KendRAG OpenSearchRAG Ease

    of environment construction ☆☆☆ ☆☆ ☆ Abundant features ☆☆ ☆ ☆☆ Extensibility ☆ ☆☆☆ ☆ Japanese Support ☆ ☆☆ ☆☆
  20. https://my.prairie.cards/u/moritalous Accuracy evaluation of RAG • The evaluation was carried

    out using “Ragas”, a framework for quantitatively evaluating the accuracy of RAG. • The following four indicators were used: 29 https://docs.ragas.io/en/stable/concepts/metrics/index.html
  21. https://my.prairie.cards/u/moritalous Verification conditions for accuracy evaluation • AWS What's New

    articles published in 2024 in Japanese (1,267 articles in total) • Questions and answers were generated using Ragas and used as test data (200 articles) • Test data generation and evaluation were performed using GPT-4o mini 30 Ragas generated question How does billable usage now appear in Amazon CloudWatch metrics for AWS Config? Ragas generated answer Amazon CloudWatch metrics for monitoring AWS Config data usage now display only billable usage, eliminating the need to show non-billable usage, enabling you to validate your AWS Config configuration and usage, and associate billable usage with associated costs.
  22. https://my.prairie.cards/u/moritalous Specific conditions for each architecture 31 KB4AB KendRAG OpenSearchRAG

    LLM for answer generation Claude 3 Sonnet Command R+ Mistral Large 2 LLM for embeddings Titan Embeddings v2 unused Titan Embeddings v2 Other features • Hierarchical chunking • Break down queries • Hybrid search • Generate search query • Japanese Index • Sudachi analyzer • Reranking with Cohere Rerank3 • Hybrid search • Text chunking • Japanese Support
  23. https://my.prairie.cards/u/moritalous Accuracy evaluation results 33 KB4AB (Claude 3 Sonnet) KendRAG

    (Command R+) OpenSearchRAG (Mistral Large 2) faithfulness 0.7904 0.8423 0.6197 answer_relevancy 0.6914 0.6723 0.7345 context_precision 0.9346 0.8814 0.8981 context_recall 0.9675 0.9342 0.9400
  24. https://my.prairie.cards/u/moritalous Example of answer generation (KB4AB) 34 Ragas generated question

    How does billable usage now appear in Amazon CloudWatch metrics for AWS Config? KB4AB generated answer Amazon CloudWatch metrics for AWS Config now show only billable usage. They no longer show usage for resource types that are not recorded, providing a clearer view of your billable usage. This enables you to use CloudWatch metrics to validate your AWS Config configuration and usage, and to correlate billable usage with associated costs.
  25. https://my.prairie.cards/u/moritalous Example of answer generation (KendRAG) 35 Ragas generated question

    How does billable usage now appear in Amazon CloudWatch metrics for AWS Config? KendRAG generated answer AWS Config monitors resource configuration changes by generating configuration items (CIs) for each resource type recorded. Now you can see only the usage that is billable because with this launch, Amazon CloudWatch metrics for AWS Config usage only show usage for the resource types that are recorded.
  26. https://my.prairie.cards/u/moritalous Example of answer generation (OpenSearchRAG) 36 Ragas generated question

    How does billable usage now appear in Amazon CloudWatch metrics for AWS Config? OpenSearchRAG generated answer The difference between AWS Config and Amazon CloudWatch Metrics is that AWS Config is a service for recording configuration changes of resources and auditing for compliance, while Amazon CloudWatch Metrics is a service for collecting and monitoring performance data of resources. AWS Config helps you track configuration changes of resources and check compliance, while CloudWatch Metrics helps you monitor the performance of your resources in real time.
  27. https://my.prairie.cards/u/moritalous Accuracy evaluation considerations 37 • The evaluation results varied

    widely, so it is not possible to say which configuration is best. • When I looked at the responses individually, I felt that none of the configurations performed poorly and had a certain level of performance. • The results may change depending on the evaluation conditions. “Evaluation of the evaluation method” may be necessary. ◦ LLM used for evaluation ◦ Document format used for evaluation • I hope that RAG evaluation will be possible with Bedrock's model evaluation function.
  28. https://my.prairie.cards/u/moritalous Overall results 38 KB4AB KendRAG OpenSearchRAG Ease of environment

    construction ☆☆☆ ☆☆ ☆ Abundant features ☆☆ ☆ ☆☆ Extensibility ☆ ☆☆☆ ☆ Japanese Support ☆ ☆☆ ☆☆ Accuracy ☆☆ ☆☆ ☆☆
  29. https://my.prairie.cards/u/moritalous The verification code is published on GitHub. 39 I

    had a hard time building the OpenSearchRAG configuration, so please take a look. https://github.com/moritalous/ultimate_rag_showdown