Failing to reason with LLMs (ARC AGI kaggle up...

August 06, 2024

110

Failing to reason with LLMs (ARC AGI kaggle update with Llama3)

Lightning talk at PyDataLondon 2024 August, I spoke on the Kaggle ARC AGI competition and how I've pushed on with using Program as Thought and ideas behind Self Consistency and CRITIC to enhance the model-writing capability.
https://www.meetup.com/pydata-london-meetup/events/302434648/

ianozsvald

August 06, 2024

More Decks by ianozsvald

See All by ianozsvald

Successful Projects through a bit of Rebellion

ianozsvald

Valuable Lessons Learned on Kaggle’s ARC AGI LLM Challenge (PyDataGlobal 2024)

ianozsvald

400

Valuable Lessons Learned on Kaggle’s ARC AGI LLM challenge

ianozsvald

220

ARC AGI Kaggle with llama3 - First Steps

ianozsvald

220

Llama.cpp for fun (and maybe profit) - 30 minute

ianozsvald

220

Llama.cpp for fun (and maybe profit) - 30 minute

ianozsvald

120

Llama.cpp for fun (and maybe profit)

ianozsvald

120

CuDF – Maybe faster Pandas on the GPU via RAPIDS (NVIDIA)

ianozsvald

210

Running llama.cpp on the CPU

ianozsvald

540

Other Decks in Technology

See All in Technology

OPENLOGI Company Profile for engineer

hr01

38k

アカデミーキャンプ 2025 SuuuuuuMMeR「燃えろ!!ロボコン」 / Academy Camp 2025 SuuuuuuMMeR "Burn the Spirit, Robocon!!" DAY 1

ks91

PRO

140

Strands Agents & Bedrock AgentCoreを1分でおさらい

minorun365

PRO

320

九州の人に知ってもらいたいGISスポット / gis spot in kyushu 2025

sakaik

150

リモートワークで心掛けていること〜AI活用編〜

naoki85

150

生成AIによるソフトウェア開発の収束地点 - Hack Fes 2025

vaaaaanquish

12k

大規模イベントに向けた ABEMA アーキテクチャの遍歴 ~ Platform Strategy 詳細解説 ~

nagapad

230

ZOZOTOWNの大規模マーケティングメール配信を支えるアーキテクチャ

zozotech

PRO

290

Bet "Bet AI" - Accelerating Our AI Journey #BetAIDay

layerx

PRO

1.7k

金融サービスにおける高速な価値提供とAIの役割 #BetAIDay

layerx

PRO

830

AIに目を奪われすぎて、周りの困っている人間が見えなくなっていませんか？

cap120

620

20250807 Applied Engineer Open House

sakana_ai

PRO

360

Featured

See All Featured

Music & Morning Musume

bryan

6.7k

Designing for Performance

lara

610

69k

Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End

smashingmag

251

21k

Into the Great Unknown - MozCon

thekraken

Reflections from 52 weeks, 52 projects

jeffersonlam

351

21k

How to Think Like a Performance Engineer

csswizardry

1.8k

Large-scale JavaScript Application Architecture

addyosmani

512

110k

Let's Do A Bunch of Simple Stuff to Make Websites Faster

chriscoyier

507

140k

RailsConf 2023

tenderlove

1.2k

The Success of Rails: Ensuring Growth for the Next 100 Years

eileencodes

7.6k

Keith and Marios Guide to Fast Websites

keithpitt

411

22k

Making Projects Easy

brettharned

117

6.3k

Transcript

Abstractly reasoning – failing with an LLM (next steps for
ARC AGI) PyDataLondon 2024-08 lightning talk @IanOzsvald – ianozsvald.com
Can LLMs reason? ARC AGI Abstract JSON “initial → target”
Tried “don’t code, just reason” Llama3 70B pretty smart Llama3 8B writes code pretty well, sometimes Abstraction & Reasoning Challenge By [ian]@ianozsvald[.com] Ian Ozsvald
30% solutions pretty good! By [ian]@ianozsvald[.com] Ian Ozsvald It counts!
Comments! Reasonable numpy! Correct substitution!
Convincing weirdness By [ian]@ianozsvald[.com] Ian Ozsvald
Big issue – it gets stuck on the same ideas
Get LLM to read lots of failed model outputs, summarise, then maybe I could ask it to make new strategies? Notes → NotANumber.email newsletter Next steps By [ian]@ianozsvald[.com] Ian Ozsvald

Failing to reason with LLMs (ARC AGI kaggle up...

Failing to reason with LLMs (ARC AGI kaggle update with Llama3)

ianozsvald

More Decks by ianozsvald

Other Decks in Technology

Featured

Transcript

Abstractly reasoning – failing with an LLM (next steps for

Can LLMs reason? ARC AGI Abstract JSON “initial → target”

30% solutions pretty good! By [ian]@ianozsvald[.com] Ian Ozsvald It counts!

Convincing weirdness By [ian]@ianozsvald[.com] Ian Ozsvald

Big issue – it gets stuck on the same ideas