apidays New York 2025 - To tune or not to tune by Anamitra Dutta Majumdar (Intuit)

SPEAKER WELCOME PACKET May 14 & 15 2025 To tune
or not to tune: Benefits and security pitfalls of model fine -tuning Anamitra D Majumdar, Principal Engineer, Intuit

Intuit Confidential and Proprietary 2 Intuit is the global financial
technology platform that powers prosperity for 100 million consumer and small business customers worldwide.

SPEAKER WELCOME PACKET About Me As a Principal Architect at
Intuit, I lead the security architecture and engineering technical initiatives for the Global Business Solutions Group, which includes the QuickBooks and Mailchimp product line. While I help build secure AI workflows in products I also spearhead initiatives to use AI for shifting security and abuse controls to the left.

SPEAKER WELCOME PACKET Agenda Why fine tuning: Key industry Trends
Model Alignment for safety and how fine tuning impacts it Attacks on LLM Safety Alignment A case study from financial domain Risks of model misalignment and mitigations

SPEAKER WELCOME PACKET Why Fine-tuning Key Industry Trends

SPEAKER WELCOME PACKET Fine-tuning Industry Trends The dynamic fine-tuning landscape
is rapidly evolving with three emerging patterns Tuning as a Platform Service Greenfield Fine -tuning Fine-tuned Models for Inference

SPEAKER WELCOME PACKET Industry Trends 7

SPEAKER WELCOME PACKET Benefits of fine - tuning Task/Domain specificity
Customization Reduced Cost

SPEAKER WELCOME PACKET Model Alignment and fine-tuning

SPEAKER WELCOME PACKET Model Alignment Achieving alignment in an LL
means ensuring its output aligns perfectly with human values, choices, and goals in a positive, ethical, and trustworthy manner. A misaligned model pursues unintended objectives

SPEAKER WELCOME PACKET LLM Alignment Process Human Values Establish ethical
guidelines and principles Training Process Implement alignment techniques during model development Évaluation & Testing Verify alignment through rigorous testing Continuous Improvement Iterative refinement of alignment methods

SPEAKER WELCOME PACKET Pre Trained LLM Pre-trained Base LLM that
can predict next token Alignment of LLM Aligned for safety and helpfulness Fine- tuned LLM Effectively perturb the weights within the model to bias certain paths that represent new knowledge.

SPEAKER WELCOME PACKET Attacks on Safety Alignment of large language
models

SPEAKER WELCOME PACKET Safety Alignment for LLMs Harmful Content Prevention
Techniques to prevent generation of dangerous, illegal, or unethical content Bias Mitigation Methods to identify and reduce unfair biases in model outputs Robustness to Attacks Defenses against attempts to circumvent safety measures

SPEAKER WELCOME PACKET Attacks on Safety Alignment for LLMs Prompt
Injection Crafting inputs that bypass safety filters Jailbreaking Techniques to circumvent built-in restrictions Adversarial Examples Inputs designed to cause model failures Model Extraction Attempts to steal model capabilities or training data

SPEAKER WELCOME PACKET Adversarial techniques on Model Safety Alignment Three
main approaches to undermine safety guardrails in language models Discrete input manipulation Crafting prompts that bypass safety filters through careful word choice or linguistic tricks • Token-level manipulations • Character substitutions • Context confusion techniques Embedding Space Attack Directly manipulating the model's internal vector representations to bypass safety mechanisms at a deeper level • Gradient-based optimization • Latent space navigation • Représentation poisoning Fine-tuning Exploiting the training process to introduce vulnerabilities or backdoors that compromise safety alignment • Adversarial data poisoning • Alignment evasion • Transfer learning attacks Each technique operates at a different level of the model architecture, requiring distinct defense strategies

SPEAKER WELCOME PACKET A Case Study from financial domain Applying
Safe prompt engineering practices to fine -tune models

SPEAKER WELCOME PACKET Prompt Optimization for Security in Fine -tuned
Models Safe Prompt Engineering ➢ Is an important measure in protecting Generative AI based workflows Goal ➢ Understand if safe prompt engineering techniques could improve robustness without impacting accuracy

SPEAKER WELCOME PACKET Model Fine -tuning in financial domain to
improve customer empathy Fine-tuning Approach Foundational open source model fine-tuned with financial data Primary Function Facilitate categorization of transactions Customer Benefit Improved productivity with auto categorization feature

SPEAKER WELCOME PACKET Security Optimization of System Prompt Prompt was
optimized using the following strategies Specialized Role Definition Enhanced role definition with clear ethical boundaries and professional guidelines Detailed Guidelines Added specific instructions for accurate transaction categorization Historical Data Utilization Instructions to analyze past transaction data for consistent categorization patterns Refusal Mechanism Implemented explicit refusal mechanism for inappropriate or unethical requests

SPEAKER WELCOME PACKET Prompt Optimization Impact: Foundational Model

SPEAKER WELCOME PACKET Prompt Optimization Impact: Fine -tuned Model

SPEAKER WELCOME PACKET Risks of Model Misalignment and mitigation

SPEAKER WELCOME PACKET Risk Modalities Risk Modality 1 fine-tuning with
explicitly harmful datasets safety alignment of both models is largely removed upon fine -tuning with such a few harmful examples. Risk Modality 2 fine-tuning with implicitly harmful datasets model fine-tuned on implicitly harmful datasets are generally jailbroken and willing to fulfill almost any (unseen) harmful instruction. Risk Modality 3 fine-tuning with benign datasets benign use cases reveaL that even when end -users have no malicious intent, merely fine-tuning with some benign (and purely utility -oriented) datasets can compromise LLMs' safety alignment

SPEAKER WELCOME PACKET Pre fine-tuning Risks: • Training Data poisoning
• Training data Leakage Mitigations: • Implement Data Sanitization • Remove PII from training data • Mix utility data and safety data during training

SPEAKER WELCOME PACKET During fine -tuning Risks: Catastrophic forgetting Choose
the right fine-tuning technique Supervised Fine-tuning Good for classification use cases Reinforcement Learning Good for complex domain-specific tasks that require advanced reasoning Direct Preference Optimization Good for summarizing text and generating chat messages

SPEAKER WELCOME PACKET Post fine-tuning Risks: • Model theft •
Membership inference • Training data leakage Mitigations: • Implement tight access controls to protect the fine -tuned model at inference • Test the fine-tuned model for safety and utility before deploying to production • Restrict output while still maintaining utility • Rate limit access to the fine-tuned model at inference • Audit Model behaviour to measure drift

SPEAKER WELCOME PACKET Key Takeaways 1 Explore alternatives to fine-tuning
like RAG that may offer comparable boost on accuracy 2 Implement PII data scrubbing before fine - tuning 3 Apply strict access control for inference 4 Develop detectors for data leakage and data poisoning 5 Implement training data sanitization before fine tuning 6 Perform sufficient testing for safety and accuracy 7 Mix utility data and safety data during training

SPEAKER WELCOME PACKET Thank You

SPEAKER WELCOME PACKET References • Harmful Fine-tuning Attacks and Defenses
for Large Language Model • Fine-tuning Aligned Language Models Compromises Safety with benign datasets • MedEmbed : Medical focussed embedding models • Fine-tuning support in OpenAI

apidays New York 2025 - To tune or not to tune ...

apidays New York 2025 - To tune or not to tune by Anamitra Dutta Majumdar (Intuit)

apidays PRO

More Decks by apidays

Other Decks in Programming

Featured

Transcript

SPEAKER WELCOME PACKET May 14 & 15 2025 To tune

Intuit Confidential and Proprietary 2 Intuit is the global financial

SPEAKER WELCOME PACKET About Me As a Principal Architect at

SPEAKER WELCOME PACKET Agenda Why fine tuning: Key industry Trends

SPEAKER WELCOME PACKET Why Fine-tuning Key Industry Trends

SPEAKER WELCOME PACKET Fine-tuning Industry Trends The dynamic fine-tuning landscape

SPEAKER WELCOME PACKET Industry Trends 7

SPEAKER WELCOME PACKET Benefits of fine - tuning Task/Domain specificity

SPEAKER WELCOME PACKET Model Alignment and fine-tuning

SPEAKER WELCOME PACKET Model Alignment Achieving alignment in an LL

SPEAKER WELCOME PACKET LLM Alignment Process Human Values Establish ethical

SPEAKER WELCOME PACKET Pre Trained LLM Pre-trained Base LLM that

SPEAKER WELCOME PACKET Attacks on Safety Alignment of large language

SPEAKER WELCOME PACKET Safety Alignment for LLMs Harmful Content Prevention

SPEAKER WELCOME PACKET Attacks on Safety Alignment for LLMs Prompt

SPEAKER WELCOME PACKET Adversarial techniques on Model Safety Alignment Three

SPEAKER WELCOME PACKET A Case Study from financial domain Applying

SPEAKER WELCOME PACKET Prompt Optimization for Security in Fine -tuned

SPEAKER WELCOME PACKET Model Fine -tuning in financial domain to

SPEAKER WELCOME PACKET Security Optimization of System Prompt Prompt was

SPEAKER WELCOME PACKET Prompt Optimization Impact: Foundational Model

SPEAKER WELCOME PACKET Prompt Optimization Impact: Fine -tuned Model

SPEAKER WELCOME PACKET Risks of Model Misalignment and mitigation

SPEAKER WELCOME PACKET Risk Modalities Risk Modality 1 fine-tuning with

SPEAKER WELCOME PACKET Pre fine-tuning Risks: • Training Data poisoning

SPEAKER WELCOME PACKET During fine -tuning Risks: Catastrophic forgetting Choose

SPEAKER WELCOME PACKET Post fine-tuning Risks: • Model theft •

SPEAKER WELCOME PACKET Key Takeaways 1 Explore alternatives to fine-tuning

SPEAKER WELCOME PACKET Thank You

SPEAKER WELCOME PACKET References • Harmful Fine-tuning Attacks and Defenses