Running llama.cpp on the CPU

January 09, 2024

Science

530

Running llama.cpp on the CPU

A 5 minute lightning talk introducing llama.cpp, showing how we can run gguf models on the CPU without needing a GPU. I show llama2, WizardCoder and Llava multimodal, with command line arguments and links to the source gguf files.
To be written up on: https://notanumber.email/ and https://ianozsvald.com/
License - Creative Commons By Attribution

ianozsvald

January 09, 2024

Tweet

More Decks by ianozsvald

See All by ianozsvald

Successful Projects through a bit of Rebellion

0

63

Valuable Lessons Learned on Kaggle’s ARC AGI LLM Challenge (PyDataGlobal 2024)

0

400

Valuable Lessons Learned on Kaggle’s ARC AGI LLM challenge

0

210

ARC AGI Kaggle with llama3 - First Steps

0

220

Failing to reason with LLMs (ARC AGI kaggle update with Llama3)

0

110

Llama.cpp for fun (and maybe profit) - 30 minute

0

220

Llama.cpp for fun (and maybe profit) - 30 minute

0

120

Llama.cpp for fun (and maybe profit)

0

120

CuDF – Maybe faster Pandas on the GPU via RAPIDS (NVIDIA)

0

210

Other Decks in Science

See All in Science

Explanatory material

0

380

データベース06: SQL (3/3) 副問い合わせ

PRO

1

620

3次元点群を利用した植物の葉の自動セグメンテーションについて

2

1.3k

動的トリートメント・レジームを推定するDynTxRegimeパッケージ

0

170

データベース15: ビッグデータ時代のデータベース

PRO

0

310

ウェブ・ソーシャルメディア論文読み会第25回: Differences in misinformation sharing can lead to politically asymmetric sanctions (Nature, 2024)

0

120

データベース04: SQL (1/3) 単純質問 & 集約演算

PRO

0

970

0

120

2025-06-11-ai_belgium

1

140

03_草原和博_広島大学大学院人間社会科学研究科教授_デジタル_シティズンシップシティで_新たな_学び__をつくる.pdf

0

550

Celebrate UTIG: Staff and Student Awards 2025

0

110

安心・効率的な医療現場の実現へ　～オンプレAI & ノーコードワークフローで進める業務改革～

0

300

Featured

See All Featured

[RailsConf 2023 Opening Keynote] The Magic of Rails

29

9.6k

Designing Experiences People Love

142

24k

The Cult of Friendly URLs

79

6.5k

Refactoring Trust on Your Teams (GOTO; Chicago 2020)

34

3.1k

Balancing Empowerment & Direction

1

530

Git: the NoSQL Database

PRO

431

65k

Distributed Sagas: A Protocol for Coordinating Microservices

332

22k

Automating Front-end Workflow

1370

200k

Exploring the Power of Turbo Streams & Action Cable | RailsConf2023

34

6k

jQuery: Nuts, Bolts and Bling

63

7.8k

Producing Creativity

PRO

346

40k

Easily Structure & Communicate Ideas using Wireframe

194

16k

Transcript

llama.cpp – what do we get? PyDataLondon 2024-01 lightning talk
@IanOzsvald – ianozsvald.com
No need for a GPU+VRAM Llama.cpp runs on CPU+RAM Nothing
sent off your machine llama.cpp By [ian]@ianozsvald[.com] Ian Ozsvald X
Prototype ideas! By [ian]@ianozsvald[.com] Ian Ozsvald llama-2-7b-chat.Q5_K_M.gguf 5GB on disk
and in RAM, near real time
Experiment with coding assistants (base llama2 model not good at
this) By [ian]@ianozsvald[.com] Ian Ozsvald
WizardCoder is good (tuned llama2) By [ian]@ianozsvald[.com] Ian Ozsvald wizardcoder-python-34b
-v1.0.Q5_K_S.gguf 22GB on disk & RAM 15s for example You can replace CoPilot with this for completions
Llava multi-modal Extract facts from images? By [ian]@ianozsvald[.com] Ian Ozsvald
llava-v1.5-7b-Q4_K.gguf 4GB on disk & RAM 5s for example llama.cpp provides ./server
By [ian]@ianozsvald[.com] Ian Ozsvald
Try Mixtral, Phi2, UniNER etc Wild wild west (Aug+ is
sane) What could you prototype? Let’s discuss in the break – what are you building? Summary By [ian]@ianozsvald[.com] Ian Ozsvald