ARC AGI Kaggle with llama3 - First Steps

Lightning talk at PyDataLondon 2024 July, I spoke on the Kaggle ARC AGI competition and how I've made Llama 3 write code to solve a couple of the challenges.
https://www.meetup.com/pydata-london-meetup/events/301796857/

ianozsvald

August 06, 2024

More Decks by ianozsvald

See All by ianozsvald

LLMs vs Chess

ianozsvald

Amusing Abliteration

ianozsvald

120

playgroup - PyDataLondon 2025-10 Lightning Talk

ianozsvald

Successful Projects through a bit of Rebellion

ianozsvald

Valuable Lessons Learned on Kaggle’s ARC AGI LLM Challenge (PyDataGlobal 2024)

ianozsvald

480

Valuable Lessons Learned on Kaggle’s ARC AGI LLM challenge

ianozsvald

250

Failing to reason with LLMs (ARC AGI kaggle update with Llama3)

ianozsvald

130

Llama.cpp for fun (and maybe profit) - 30 minute

ianozsvald

250

Llama.cpp for fun (and maybe profit) - 30 minute

ianozsvald

170

Other Decks in Technology

See All in Technology

管理者向けGitHub Enterpriseの運用Tips紹介: 人にもAIにも優しいプラットフォームづくり

yuriemori

110

Oracle Base Database Service 技術詳細

oracle4engineer

PRO

95k

技術的負債の泥沼から組織を救う3つの転換点

nwiizo

2.1k

型を書かないRuby開発への挑戦

riseshia

180

Digitization部紹介資料

sansan33

PRO

Claude Cowork Plugins を読む - Skills駆動型業務エージェント設計の実像と構造

knishioka

260

Kaggleで鍛えたスキルの実務での活かし方　競技とプロダクト開発のリアル

recruitengineers

PRO

150

チームメンバー迷わないIaC設計

hayama17

3.8k

OCI Security サービス概要

oracle4engineer

PRO

13k

「ストレッチゾーンに挑戦し続ける」ことって難しくないですか？メンバーの持続的成長を支えるEMの環境設計

sansantech

PRO

310

Secure Boot 2026 - Aggiornamento dei certificati UEFI e piano di adozione in azienda

memiug

140

GitLab Duo Agent Platform + Local LLMサービングで幸せになりたい

jyoshise

100

Featured

See All Featured

Leo the Paperboy

mayatellez

1.5k

Learning to Love Humans: Emotional Interface Design

aarron

275

41k

4 Signs Your Business is Dying

shpigford

187

22k

Lightning talk: Run Django tests with GitHub Actions

sabderemane

140

Building Better People: How to give real-time feedback that sticks.

wjessup

370

20k

Creating an realtime collaboration tool: Agile Flush - .NET Oxford

marcduiker

2.4k

Build your cross-platform service in a week with App Engine

jlugia

234

18k

JavaScript: Past, Present, and Future - NDC Porto 2020

reverentgeek

5.9k

Facilitating Awesome Meetings

lara

6.8k

10 Git Anti Patterns You Should be Aware of

lemiorhan

PRO

659

61k

Done Done

chrislema

186

16k

How to Build an AI Search Optimization Roadmap - Criteria and Steps to Take #SEOIRL

aleyda

1.9k

Transcript

ARC AGI Kaggle with llama3 First Steps PyDataLondon 2024-07 lightning
talk @IanOzsvald – ianozsvald.com
LLMs great at memorisation, can they reason? F. Chollet argues
that they’re bad at reasoning $1M prize if LLM/other can solve these challenges Abstract shapes “initial → target” in JSON Open-weights models only (runs in off-line env) Abstraction & Reasoning Challenge By [ian]@ianozsvald[.com] Ian Ozsvald
What rules do you need? By [ian]@ianozsvald[.com] Ian Ozsvald
Llama.cpp with quantised Llama 8B (and 70B) Python llama.cpp bindings
Ask for 200 solutions Try grid, list, grid+list representations Grid only – poor. List better. Grid+list slightly better First solution By [ian]@ianozsvald[.com] Ian Ozsvald
Llama (normally) writes code By [ian]@ianozsvald[.com] Ian Ozsvald Bad syntax,
no code, raw_input, injection back into the training data (changing ints to strings)
Llama 3 8B IQ2 (heavy quant), some run correctly on
3x3 “train” problem Very fast, runs on 3090 (24GB VRAM) Do you use Llama 3? Alpaca? ROPE? Do you have text correctness metrics? Summary By [ian]@ianozsvald[.com] Ian Ozsvald

ARC AGI Kaggle with llama3 - First Steps

ARC AGI Kaggle with llama3 - First Steps

ianozsvald

More Decks by ianozsvald

Other Decks in Technology

Featured

Transcript

ARC AGI Kaggle with llama3 First Steps PyDataLondon 2024-07 lightning

LLMs great at memorisation, can they reason? F. Chollet argues

What rules do you need? By [ian]@ianozsvald[.com] Ian Ozsvald

Llama.cpp with quantised Llama 8B (and 70B) Python llama.cpp bindings

Llama (normally) writes code By [ian]@ianozsvald[.com] Ian Ozsvald Bad syntax,

Llama 3 8B IQ2 (heavy quant), some run correctly on