Red Teaming Latent Spaces & Protecting LLM apps

By Raul Pino PyCon Austria 2025 Red Teaming Latent Spaces
& Protecting LLM apps

Agenda • Intro: ◦ LLM Apps ◦ Red Teaming •
HackAPrompt Paper ◦ Ontology and generic concepts. • Attack Vectors or Vulnerabilities • Demos • Security and countermeasures • Takeaways & beyond

About Me • Born in Venezuela. • +10 years of
exp as Software Engineer & AI enthusiast (ML Eng recently). • Living in Chile. ◦ Halborn, Distro (YC S24), Elementus, uBiome, Groupon. • <3 AI, Coffee, Scuba Diving, …

*** Predicts the next token. https://bbycroft.net/llm https://poloclub.github.io/transformer-explainer/ What is an
LLM? Large Language Models (LLM): ChatGPT, Claude, Mistral, DeepSeek, …

LLM App: Cursor, Github Copilot, … What is an LLM?
LLM App? … but the most basic might be:

What is an LLM App? RAG? LLM App: Cursor, Github
Copilot, … *** Helpful assistant.

• War games and military strategy exercises. • 1990s–2000s: expanded
into cybersecurity, simulating cyber attacks. What is Red Teaming? A “red team” played the role of the enemy to test the defenses and strategy of the “blue team”.

Red Teaming an LLM App is not:

Red Teaming an LLM App is::

Ignore This Title and HackAPrompt! • A global prompt hacking
competition (2023). • 2800 people from 50+ countries. • 600K+ adversarial prompts against three state-of-the art LLMs. Paper: https://arxiv.org/abs/2311.16119 Dataset: https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset Podcast: https://www.latent.space/p/learn-prompting

Ignore This Title and HackAPrompt! Simple Prompt Hacking

Prompt Hacking =(Prompt Injection, Jailbreaking) Exploiting: 1. Predicts the next
token. 2. Helpful assistant. Demos incoming, to the notebooks!

Taxonomical Ontology of Exploits

Demos: Attack Vectors or Vulnerabilities • Instruction Manipulation Attacks: Directly
Changing the Model’s Behavior • Contextual Exploits: Manipulating How the Model Understands the Input • Obfuscation & Encoding Attacks: Hiding Malicious Intent • Resource Exploitation Attacks: Abusing System Limitations …To the notebooks!

• Documented 29 separate prompt hacking techniques. • Attackers iterated
and refined their prompts over time. ◦ Lengthier attacks were initially successful but later optimized for brevity. • Models with higher verbosity (like ChatGPT) were harder to hack but still failed. HackAPrompt: Results and Key Insights “LLM security is in early stages, and just like human social engineering may not be 100% solvable, so too could prompt hacking prove to be an impossible problem; you can patch a software bug, but perhaps not a (neural) brain.” LLM Security is here to stay, we have a job!

Demos: Red Teaming • Manual/standard prompts list • LLMs as
adversarial prompts generator • LLMs as generator and evaluator • Using OS Library + Service …To the notebooks!

Other Vulnerabilities • Bias and Stereotypes • Hallucinations

Other Ontologies and Organizations • OWASP Top 10 for LLM
Applications 2025 - https://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025 • AI Incident Database - https://incidentdatabase.ai • AI Vulnerability Database - https://avidml.org

Security & Countermeasures: Guardrails • Guardrails AI: https://www.guardrailsai.com/ ◦ Langchain
helper: https://python.langchain.com/v0.1/docs/templates/guardrails-output-parser/ • Databricks Guardrails: https://www.databricks.com/blog/implementing-llm-guardrails-safe-and-respon sible-generative-ai-deployment-databricks • AWS Bedrock Guardrails: https://aws.amazon.com/bedrock/guardrails/

…To the notebooks! Security & Countermeasures: Guardrails

Reminder: System Vulnerabilities LLM App is part of a larger
system!

• Multimodal challenges ◦ Old adversarial image attacks Takeaways &
beyond https://www.youtube.com/watch?v=Klepca1Ny3c

• Multimodal challenges ◦ New adversarial image attack Takeaways &
beyond https://arxiv.org/pdf/2410.08338

• Multimodal challenges ◦ The most creative I’ve found :’)
Takeaways & beyond https://x.com/me_irl/status/1901497992865071428?s=46

Takeaways & beyond • HackAPrompt (1.0) paper still relevant! •
HackAPrompt 2.0 it’s coming https://www.hackaprompt.com/

Resources • https://www.hackaprompt.com/ • Coursera - https://learn.deeplearning.ai/courses/red-teaming-llm-applications • https://arxiv.org/abs/2311.16119 •
https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset • https://www.latent.space/p/learn-prompting • https://bbycroft.net/llm • https://poloclub.github.io/transformer-explainer/ • OWASP Top 10 for LLM Applications 2025 - https://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025 • AI Incident Database - https://incidentdatabase.ai • AI Vulnerability Database - https://avidml.org • https://www.giskard.ai/knowledge/how-to-implement-llm-as-a-judge-to-test-ai-agents-part-1 • https://www.giskard.ai/knowledge/how-to-implement-llm-as-a-judge-to-test-ai-agents-part-2 • https://arxiv.org/pdf/2410.08338 • https://tntattacks.github.io/ • https://github.com/p1nox/red-teaming-latent-spaces

Danke schön :)

Red Teaming Latent Spaces & Protecting LLM apps

Red Teaming Latent Spaces & Protecting LLM apps

Raul Pino

More Decks by Raul Pino

Other Decks in Technology

Featured

Transcript

By Raul Pino PyCon Austria 2025 Red Teaming Latent Spaces

Agenda • Intro: ◦ LLM Apps ◦ Red Teaming •

About Me • Born in Venezuela. • +10 years of

*** Predicts the next token. https://bbycroft.net/llm https://poloclub.github.io/transformer-explainer/ What is an

LLM App: Cursor, Github Copilot, … What is an LLM?

What is an LLM App? RAG? LLM App: Cursor, Github

• War games and military strategy exercises. • 1990s–2000s: expanded

Red Teaming an LLM App is not:

Red Teaming an LLM App is::

Ignore This Title and HackAPrompt! • A global prompt hacking

Ignore This Title and HackAPrompt! Simple Prompt Hacking

Prompt Hacking =(Prompt Injection, Jailbreaking) Exploiting: 1. Predicts the next

Taxonomical Ontology of Exploits

Demos: Attack Vectors or Vulnerabilities • Instruction Manipulation Attacks: Directly

• Documented 29 separate prompt hacking techniques. • Attackers iterated

Demos: Red Teaming • Manual/standard prompts list • LLMs as

Other Vulnerabilities • Bias and Stereotypes • Hallucinations

Other Ontologies and Organizations • OWASP Top 10 for LLM

Security & Countermeasures: Guardrails • Guardrails AI: https://www.guardrailsai.com/ ◦ Langchain

…To the notebooks! Security & Countermeasures: Guardrails

Reminder: System Vulnerabilities LLM App is part of a larger

• Multimodal challenges ◦ Old adversarial image attacks Takeaways &

• Multimodal challenges ◦ New adversarial image attack Takeaways &

• Multimodal challenges ◦ The most creative I’ve found :’)

Takeaways & beyond • HackAPrompt (1.0) paper still relevant! •

Resources • https://www.hackaprompt.com/ • Coursera - https://learn.deeplearning.ai/courses/red-teaming-llm-applications • https://arxiv.org/abs/2311.16119 •

Danke schön :)