💀 AI Red Teaming Case Study: When RAG gets RAGged Up 💀

@run2obtain 1 AI Red Teaming: Case Study Financial Transaction Hijacking
With M365 Co-Pilot as an Insider Source: https://atlas.mitre.org/studies/AML.CS0026

@run2obtain Overview (1) 2 Researchers from Zenity conducted a red
teaming exercise in August 2024 that successfully manipulated Microsoft 365 Copilot. The attack abused the fact that Copilot ingests received emails into a RAG database. The attack was initiated by sending an email that contained info designed to be retrieved by a user query as well as a prompt injection to manipulate Copilot’s behaviour.

@run2obtain Overview (2) 3 Essentially, the info in the email
targeted users searching for banking info necessary for completing bank transfers. Funny enough, the info had the attacker's bank details instead. The prompt injection overrode Copilot's search functionality to treat the attacker’s info as a retrieved document and also manipulated the document reference in it’s response. This tricked the user into believing that Copilot's result is trustworthy and makes it more likely they will follow through with the wire transfer -> to the benefit of the attacker.

@run2obtain Malicious Prompt 4

@run2obtain Implemented Attack Techniques (MITRE ATLAS) 5

@run2obtain Attack Steps - Overview 6

@run2obtain Attack Techniques (1) 7

@run2obtain Attack Techniques & Mitigations 11 S/No Attack Technique Mitigation
1. Gather RAG-Indexed Targets (AML.T0064) Generative AI Guardrails 2. AI-Enabled Product or Service (AML.T0047) AI Telemetry Logging 3. Discover LLM System Information: Special Character Sets (AML.T0069.000) Generative AI Guardrails 4. Discover LLM System Information: System Instruction Keywords (AML.T0069.002) Generative AI Guardrails 5. Retrieval Content Crafting (AML.T0066) Generative AI Guardrails 6. LLM Prompt Crafting (AML.T0065) Generative AI Guardrails 7. Exploit Public-Facing Application (AML.T0049) Application Isolation and Sandboxing, Exploit Protection, Privileged Account Management

@run2obtain Attack Techniques & Mitigations 12 S/No Attack Technique Mitigations
8. LLM Prompt Obfuscation (AML.T0068) AI Telemetry Logging, Generative AI Guardrails 9. RAG Poisoning (AML.T0070) Generative AI Guardrails 10. False RAG Entry Injection ( AML.T0071) Generative AI Guardrails 11. LLM Prompt Injection: Indirect (AML.T0051.001) AI Telemetry Logging 12. LLM Plugin Compromise (AML.T0053) Generative AI Guardrails, Generative AI Guidelines, Generative AI Model Alignment 13. LLM Trusted Output Components Manipulation: Citations ( AML.T0067.000) 14. External Harms: Financial Harm (AML.T0048.000)

@run2obtain A few take-aways ….. 13 1. Securing GenAI infrastructure
is of utmost importance, while traditional security approaches might help they are mostly insufficient. 2. Leverage AI Red Teaming to discover security gaps in you AI applications. 3. RAG architectures provide huge benefits, however they also introduce several security concerns that might be challenging to identify hence the need for security measures like threat modelling and AI red teaming.

@run2obtain Checkout how Mitigant secures GenAI workloads https://www.mitigant.io/en/platform/security-for-genai 14 Attack
Emulation for Amazon Bedrock

@run2obtain Leverage The Mitigant Advantage 15 https://www.mitigant.io/en/platform/cloud-attack-emulation Start your free
trial TODAY - https://www.mitigant.io/en/sign-up Free Trial

@run2obtain References 16 1. We got an ~RCE on M365
Copilot by sending an email., Twitter 2. Living off Microsoft Copilot at BHUSA24: Financial transaction hijacking with Copilot as an insider, YouTube 3. Article from The Register with response from Microsoft 4. https://www.mitigant.io/en/blog/bedrock-or-bedsand-attacking-amazon- bedrocks-achilles-heel

💀 AI Red Teaming Case Study: When RAG gets RAGg...

💀 AI Red Teaming Case Study: When RAG gets RAGged Up 💀

Kennedy Torkura

More Decks by Kennedy Torkura

Featured

Transcript

@run2obtain 1 AI Red Teaming: Case Study Financial Transaction Hijacking

@run2obtain Overview (1) 2 Researchers from Zenity conducted a red

@run2obtain Overview (2) 3 Essentially, the info in the email

@run2obtain Malicious Prompt 4

@run2obtain Implemented Attack Techniques (MITRE ATLAS) 5

@run2obtain Attack Steps - Overview 6

@run2obtain Attack Techniques (1) 7

@run2obtain Attack Techniques (2) 8

@run2obtain Attack Techniques (3) 9

@run2obtain Attack Techniques (4) 10

@run2obtain Attack Techniques & Mitigations 11 S/No Attack Technique Mitigation

@run2obtain Attack Techniques & Mitigations 12 S/No Attack Technique Mitigations

@run2obtain A few take-aways ….. 13 1. Securing GenAI infrastructure

@run2obtain Checkout how Mitigant secures GenAI workloads https://www.mitigant.io/en/platform/security-for-genai 14 Attack

@run2obtain Leverage The Mitigant Advantage 15 https://www.mitigant.io/en/platform/cloud-attack-emulation Start your free

@run2obtain References 16 1. We got an ~RCE on M365