Seguridad y auditorías en Modelos grandes del lenguaje (LLM)

UCIBER 2025 JOSÉ MANUEL ORTEGA CANDEL UCIBER CONGRESS 2025 Seguridad
y auditorías en Modelos grandes del lenguaje (LLM)

UCIBER 2025 What are we working on? Ingeniería en informática
Master ciberseguridad y ciencia de datos Consultoría y docencia universitaria https://josemanuelortegablog.com https://www.linkedin.com/in/jmortega1

UCIBER 2025 What are we working on? https://scholar.google.es/citations?user=kAM9WrcAAAAJ&hl=es

UCIBER 2025 Introducción a LLM Herramientas de auditoría en aplicaciones
que manejan modelos LLM Caso de uso con la herramienta textattack para realizar ataques adversarios 01. Introducción al OWASP LLM Top 10 Seguridad en aplicaciones que manejan modelos LLM 02. 03. 04. 05. Table of contents

UCIBER 2025 Introducción a LLM Transformers Attention is All You
Need" by Vaswani et al. in 2017 Mecanismo auto-atención Arquitectura Encoder-Decoder

UCIBER 2025 Introducción a LLM

UCIBER 2025 Introducción a LLM Pre-training + fine-tuning

UCIBER 2025 Introducción a LLM https://genai.owasp.org

UCIBER 2025 Introducción al OWASP LLM Top 10

UCIBER 2025 Prompt Injection

UCIBER 2025 Jailbreak prompts

UCIBER 2025 Data Poisoning

UCIBER 2025 Adversarial Attacks

UCIBER 2025 Adversarial Attacks Pequeñas perturbaciones Vulnerabilidades del modelo Impacto
en sistemas críticos

UCIBER 2025 Adversarial Attacks 1. Prompt Injection 2. Evasion Attacks
3. Poisoning Attacks 4. Model Inversion Attacks 5. Model Stealing Attacks 6. Membership Inference Attacks

UCIBER 2025 AI GOAT https://github.com/orcasecurity-research/AIGoat

UCIBER 2025 AI GOAT https://github.com/orcasecurity-research/AIGoat Ataques de extracción de modelos
Ataques de envenenamiento de datos Ataques adversarios

UCIBER 2025 AI GOAT https://github.com/orcasecurity-research/AIGoat

UCIBER 2025 Herramientas para evaluar la robustez de los modelos
FGSM (Fast Gradient Sign Method) PGD (Projected Gradient Descent) DeepFool

PromptInject Framework https://github.com/agencyenterprise/PromptInject PAIR - Prompt Automatic Iterative Refinement https://github.com/patrickrchao/JailbreakingLLMs TAP - Tree of Attacks with Pruning https://github.com/RICommunity/TAP

https://github.com/tensorflow/fairness-indicators

https://deepeval.com/ Evaluación Automatizada y Objetiva Métricas de Evaluación Integradas Integración con otros LLM

https://deepeval.com/

UCIBER 2025 Herramientas de auditoría https://huggingface.co/meta-llama/Prompt-Guard-86M

UCIBER 2025 Herramientas de auditoría https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Guard3/8B/MODEL_CARD.md

UCIBER 2025 Herramientas de auditoría Filtrado dinámico de entrada Normalización
y contextualización del prompt Políticas de respuesta segura Monitorización activa y respuesta automática

UCIBER 2025 Herramientas de auditoría

UCIBER 2025 Text attack https://arxiv.org/pdf/2005.05909

UCIBER 2025 Text attack https://github.com/QData/TextAttack

UCIBER 2025 Text attack https://github.com/QData/TextAttack Original Text: "I absolutely loved
this movie! The plot was thrilling, and the acting was top-notch." Adversarial Text: "I completely liked this film! The storyline was gripping, and the performance was outstanding."

UCIBER 2025 Text attack https://github.com/QData/TextAttack from textattack.augmentation import WordNetAugmenter #
Use WordNet-based augmentation to create adversarial examples augmenter = WordNetAugmenter() # Augment the training data with adversarial examples augmented_texts = augmenter.augment(text) print(augmented_texts)

UCIBER 2025

UCIBER 2025 github.com/greshake/llm-security github.com/corca-ai/awesome-llm-security github.com/facebookresearch/PurpleLlama github.com/protectai/llm-guard github.com/cckuailong/awesome-gpt-security github.com/jedi4ever/learning-llms-and-genai-f or-dev-sec-ops github.com/Hannibal046/Awesome-LLM

Thank you! UCIBER CONGRESS

Seguridad y auditorías en Modelos grandes del l...

Seguridad y auditorías en Modelos grandes del lenguaje (LLM)

More Decks by jmortegac

Other Decks in Technology

Featured

Transcript