Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Amusing Abliteration

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for ianozsvald ianozsvald
November 28, 2025

Amusing Abliteration

Abliteration on LLMs is the act of removing guardrails - here I show how to make Llama 3.1 'less kind and good' with questions around explosives, financial restructuring advice, rude jokes and security vulnerabilities. I'm interested in the question - whilst guardrails stop us asking 'awkward questions', what other answers are watered down such that we don't get useful responses?
Created as an outcome of my playgroup research days: https://www.linkedin.com/feed/update/urn:li:activity:7396293087674933248/

Avatar for ianozsvald

ianozsvald

November 28, 2025
Tweet

More Decks by ianozsvald

Other Decks in Science

Transcript

  1. At playgroup we talked about humour generation I wondered if

    ‘abliteration’ – removing safeguards, was a good idea It was The “why” By [ian]@ianozsvald[.com] Ian Ozsvald
  2. By [ian]@ianozsvald[.com] Ian Ozsvald Abliteration removes guardrails <- This is

    the same underlying model, no extra information added
  3. By [ian]@ianozsvald[.com] Ian Ozsvald I can't tell you what it

    said! !!CENSORED!! Coarse humour! :-( Unlike dad jokes I made at playgroup, this joke didn't appear in google searches
  4. What is ‘abliteration’? LMStudio (/ollamma etc) What answers do you

    miss due to guardrails? Next steps: By [ian]@ianozsvald[.com] Ian Ozsvald