Amusing Abliteration

Abliteration on LLMs is the act of removing guardrails - here I show how to make Llama 3.1 'less kind and good' with questions around explosives, financial restructuring advice, rude jokes and security vulnerabilities. I'm interested in the question - whilst guardrails stop us asking 'awkward questions', what other answers are watered down such that we don't get useful responses?
Created as an outcome of my playgroup research days: https://www.linkedin.com/feed/update/urn:li:activity:7396293087674933248/

ianozsvald

November 28, 2025

More Decks by ianozsvald

See All by ianozsvald

Build your own LLM, Live, with MicroGPT

0

100

0

91

playgroup - PyDataLondon 2025-10 Lightning Talk

0

55

Successful Projects through a bit of Rebellion

0

110

Valuable Lessons Learned on Kaggle’s ARC AGI LLM Challenge (PyDataGlobal 2024)

0

540

Valuable Lessons Learned on Kaggle’s ARC AGI LLM challenge

0

300

ARC AGI Kaggle with llama3 - First Steps

0

310

Failing to reason with LLMs (ARC AGI kaggle update with Llama3)

0

160

Llama.cpp for fun (and maybe profit) - 30 minute

0

300

Other Decks in Science

See All in Science

機械学習 - K-means & 階層的クラスタリング

PRO

0

1.8k

フィードフォワードニューラルネットワークを用いた記号入出力制御系に対する制御器設計 / Controller Design for Augmented Systems with Symbolic Inputs and Outputs Using Feedforward Neural Network

0

160

データベース12: 正規化(2/2) - データ従属性に基づく正規化

PRO

0

1.2k

サンプル対応のない複数遺伝子発現プロファイルに対するテンソル分解型統合解析の要約

PRO

0

210

20251212_LT忘年会_データサイエンス枠_新川.pdf

0

300

東北地方における過去20年間の降水量の変化

1

330

機械学習 - SVM

PRO

2

1.2k

AIを用いた PID制御で部屋の温度制御をしてみた

PRO

0

170

Wet Active Matter

0

120

知能とはなにか　－ヒトとAIのあいだ－

PRO

1

110

イロレーティングを活用した関東大学サッカーの定量的実力評価 / A quantitative performance evaluation of Kanto University Football Association using Elo rating

0

300

Understanding CVP Waveforms: Interpretation and Clinical Implications in Anesthesiology

0

650

Featured

See All Featured

A brief & incomplete history of  UX Design for the World Wide Web: 1989–2019

2

410

Build your cross-platform service in a week with App Engine

234

18k

Automating Front-end Workflow

1370

210k

Evolution of real-time – Irina Nazarova, EuRuKo, 2024

9

1.4k

Ruling the World: When Life Gets Gamed

0

280

GraphQLの誤解/rethinking-graphql

75

12k

Thoughts on Productivity

76

5.2k

Practical Orchestrator

191

11k

B2B Lead Gen: Tactics, Traps & Triumph

0

170

We Analyzed 250 Million AI Search Results: Here's What I Found

1

1.5k

Efficient Content Optimization with Google Search Console & Apps Script

PRO

1

670

Data-driven link building: lessons from a $708K investment (BrightonSEO talk)

1

1.2k

Transcript

Amusing Abliterations PyDataLondon 2025-12 lightning talk @IanOzsvald – ianozsvald.com
At playgroup we talked about humour generation I wondered if
‘abliteration’ – removing safeguards, was a good idea It was The “why” By [ian]@ianozsvald[.com] Ian Ozsvald
By [ian]@ianozsvald[.com] Ian Ozsvald Guardrails prevent naughty stuff
By [ian]@ianozsvald[.com] Ian Ozsvald Abliteration removes guardrails <- This is
the same underlying model, no extra information added
By [ian]@ianozsvald[.com] Ian Ozsvald System exploits too
By [ian]@ianozsvald[.com] Ian Ozsvald Coarse humour? These safe jokes appear
in Google e.g. in reddit/r/DadJokes
By [ian]@ianozsvald[.com] Ian Ozsvald I can't tell you what it
said! !!CENSORED!! Coarse humour! :-( Unlike dad jokes I made at playgroup, this joke didn't appear in google searches
By [ian]@ianozsvald[.com] Ian Ozsvald Private equity – which is abliterated?
What is ‘abliteration’? LMStudio (/ollamma etc) What answers do you
miss due to guardrails? Next steps: By [ian]@ianozsvald[.com] Ian Ozsvald