important than ever Generative AI (in science) is here Experimentation ‘AI Scientist’ going from idea → ‘reviewed’ paper Generates ideas, writes code, executes experiments, analyzes results, authors paper and simulates reviews. Costs about $15 per outputted paper. Experiments currently limited to machine learning Limited to domains where ‘experiment’ is code/numerical-based (di ff usion modeling, learning dynamics, transformer-based language modeling). Results can be interesting, but with many caveats Creates interesting ideas but struggles with implementation, rigor, and accuracy due to limitations in computation, experimental depth, and current model capabilities*. 2024-9-4 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Chris Lu1,2,*, Cong Lu3,4,*, Robert Tjarko Lange1,*, Jakob Foerster2,Ü , Je Clune3,4,5,Ü and David Ha1,Ü *Equal Contribution, 1Sakana AI, 2FLAIR, University of Oxford, 3University of British Columbia, 4Vector Institute, 5Canada CIFAR AI Chair, ÜEqual Advising One of the grand challenges of artificial general intelligence is developing agents capable of conducting scientific research and discovering new knowledge. While frontier models have already been used as aides to human scientists, e.g. for brainstorming ideas, writing code, or prediction tasks, they still conduct only a small part of the scientific process. This paper presents the first comprehensive framework for fully automatic scientific discovery, enabling frontier large language models (LLMs) to perform research independently and communicate their findings. We introduce T AI S , which generates novel research ideas, writes code, executes experiments, visualizes results, describes its findings by writing a full scientific paper, and then runs a simulated review process for evaluation. In principle, this process can be repeated to iteratively develop ideas in an open-ended fashion and add them to a growing archive of knowledge, acting like the human scientific community. We demonstrate the versatility of this approach by applying it to three distinct subfields of machine learning: di usion modeling, transformer-based language modeling, and learning dynamics. Each idea is implemented and developed into a full paper at a meager cost of less than $15 per paper, illustrating the potential for our framework to democratize research and significantly accelerate scientific progress. To evaluate the generated papers, we design and validate an automated reviewer, which we show achieves near-human performance in evaluating paper scores. T AI S can produce papers that exceed the acceptance threshold at a top machine learning conference as judged by our automated reviewer. This approach signifies the beginning of a new era in scientific discovery in machine learning: bringing the transformative benefits of AI agents to the entire research process of AI itself, and taking us closer to a world where endless a ordable creativity and innovation can be unleashed on the world’s most challenging problems. Our code is open-sourced at https://github.com/SakanaAI/AI-Scientist. 1. Introduction The modern scientific method (Chalmers, 2013; Dewey, 1910; Jevons, 1877) is arguably one of the greatest achievements of the Enlightenment. Traditionally, a human researcher collects background knowledge, drafts a set of plausible hypotheses to test, constructs an evaluation procedure, collects evidence for the di erent hypotheses, and finally assesses and communicates their findings. Afterward, the resulting manuscript undergoes peer review and subsequent iterations of refinement. This procedure has led to countless breakthroughs in science and technology, improving human quality of life. However, this iterative process is inherently limited by human researchers’ ingenuity, background knowledge, and finite time. Attempting to automate general scientific discovery (Langley, 1987, 2024; Waltz and Buchanan, 2009) has been a long ambition of the community since at least the early 70s, with computer-assisted works like the Automated Mathematician (Lenat, 1977; Lenat and Brown, 1984) and DENDRAL (Buchanan and Feigenbaum, 1981). In the field of AI, researchers have envisioned the possibility of automating AI research using AI itself (Ghahramani, 2015; Schmidhuber, 1991, 2010a,b, 2012), leading to “AI-generating algorithms” (Clune, 2019). More recently, foundation models have seen tremendous advances in their general capabilities (Anthropic, 2024; Google DeepMind Gemini Team, 2023; Llama Team, 2024; OpenAI, 2023), but they have only been shown to accelerate individual parts of the research pipeline, e.g. the writing of scientific manuscripts (Altmäe et al., 2023; Corresponding author(s): Chris Lu (
[email protected]), Cong Lu (
[email protected]), and Robert Tjarko Lange (
[email protected]) arXiv:2408.06292v3 [cs.AI] 1 Sep 2024 * “GPT-4o in particular frequently fails to write LaTeX that compiles.”