Upgrade to Pro — share decks privately, control downloads, hide ads and more …

機械学習研究ができるAIへ向けて

Shiro Takagi
October 05, 2024
900

 機械学習研究ができるAIへ向けて

Shiro Takagi

October 05, 2024
Tweet

Transcript

  1. 1900s Robot AI Dendral BACON Adam AlphaFold AI Scientist MLAgent

    ChemCrow Coscientist MOOSE prompt2model ... Automated Theorem Proving SciML Physics Informed ML 2000s 2012 Laboratory Automation Scientific Workflow Program Synthesis Scholarly Document Processing Automated Experimental Design Literature Based Discovery Symbolic Regression ... Computer ML DNN この図は網羅的ではなく、分野や論文の選択には作成者の強い好み・主観が入っており、時系列も厳密ではない可能性がありますので、参考程度でお願いします Nobel Turing Challenge AI for Science 4thScience Curious Agent AI Feynman Geometric DL Galactica Bayes for Science Neural Operator ReviewRobot PaperRobot MLR-Copilot AlphaGeometry data2paper ... WINGS ... ... ChatGPT 2022 Scientific Claim Verifi. Mahoro Solevent SemNet ... DISK 3rdScience [Wang+ 2023] 2017 Transformer AutoML MLOps AM Logic Theorist Automatic Statistician Eve
  2. 1900s Robot AI Dendral BACON Adam AlphaFold AI Scientist MLAgent

    ChemCrow Coscientist MOOSE prompt2model ... Automated Theorem Proving SciML Physics Informed ML 2000s 2012 Laboratory Automation Scientific Workflow Program Synthesis Scholarly Document Processing Automated Experimental Design Literature Based Discovery Symbolic Regression ... Computer ML DNN この図は網羅的ではなく、分野や論文の選択には作成者の強い好み・主観が入っており、時系列も厳密ではない可能性がありますので、参考程度でお願いします Nobel Turing Challenge AI for Science 4thScience Curious Agent AI Feynman Geometric DL Galactica Bayes for Science Neural Operator ReviewRobot PaperRobot MLR-Copilot AlphaGeometry data2paper ... WINGS ... ... ChatGPT 2022 Scientific Claim Verifi. Mahoro Solevent SemNet ... DISK 3rdScience [Wang+ 2023] 2017 Transformer AutoML MLOps AM Logic Theorist Automatic Statistician Eve
  3. 1900s Robot AI Dendral BACON Adam AlphaFold AI Scientist MLAgent

    ChemCrow Coscientist MOOSE prompt2model ... 2000s 2012 Computer ML DNN この図は網羅的ではなく、分野や論文の選択には作成者の強い好み・主観が入っており、時系列も厳密ではない可能性がありますので、参考程度でお願いします Nobel Turing Challenge AI for Science 4thScience Curious Agent AI Feynman Galactica ReviewRobot PaperRobot MLR-Copilot AlphaGeometry data2paper ... WINGS ChatGPT 2022 Mahoro Solevent SemNet Automated Theorem Proving Laboratory Automation Scientific Workflow Program Synthesis Scholarly Document Processing Automated Experimental Design Literature Based Discovery Symbolic Regression ... Bayes for Science ... DISK 3rdScience [Wang+ 2023] 2017 Transformer SciML Physics Informed ML Geometric DL Neural Operator ... ... Scientific Claim Verifi. AutoML MLOps AM Logic Theorist Automatic Statistician Eve
  4. 多くの AI for Science / Research Automation の取り組みは... 研究の部分タスクの自動化であり、アイデア生成から知識創出まで の一気通貫な研究過程の自動化ではない

    1. 機械が取る行動/解くべき具体的なタスク/解の候補となる空間など を事前に人間が厳密に固定しておりオープンエンド/自律的でない 2. →AI 科学者というよりも、科学のための道具としての AI/機械 10
  5. 1900s Robot AI Dendral BACON Adam AlphaFold AI Scientist MLAgent

    ChemCrow Coscientist MOOSE prompt2model ... Automated Theorem Proving SciML Physics Informed ML 2000s 2012 Laboratory Automation Scientific Workflow AutoML MLOps Program Synthesis Scholarly Document Processing Automated Experimental Design Literature Based Discovery Symbolic Regression ... Computer ML DNN この図は網羅的ではなく、分野や論文の選択には作成者の強い好み・主観が入っており、時系列も厳密ではない可能性がありますので、参考程度でお願いします Nobel Turing Challenge AI for Science 4thScience Curious Agent AI Feynman Geometric DL Galactica Bayes for Science Neural Operator ReviewRobot PaperRobot MLR-Copilot AlphaGeometry data2paper ... WINGS ... ... ChatGPT 2022 Scientific Claim Verifi. Mahoro Solevent SemNet ... DISK 3rdScience [Wang+ 2023] 2017 Transformer End-to-End 自動実行 AM Logic Theorist Automatic Statistician Eve
  6. 1900s Robot AI Dendral BACON Adam AlphaFold AI Scientist MLAgent

    ChemCrow Coscientist MOOSE prompt2model ... Automated Theorem Proving SciML Physics Informed ML 2000s 2012 Laboratory Automation Scientific Workflow Program Synthesis Scholarly Document Processing Automated Experimental Design Literature Based Discovery Symbolic Regression ... Computer ML DNN この図は網羅的ではなく、分野や論文の選択には作成者の強い好み・主観が入っており、時系列も厳密ではない可能性がありますので、参考程度でお願いします Nobel Turing Challenge AI for Science 4thScience Curious Agent AI Feynman Geometric DL Galactica Bayes for Science Neural Operator ReviewRobot PaperRobot MLR-Copilot AlphaGeometry data2paper ... WINGS ... ... ChatGPT 2022 Scientific Claim Verifi. Mahoro Solevent SemNet ... DISK 3rdScience [Wang+ 2023] 2017 Transformer AutoML MLOps Open-Ended/自律性 AM Logic Theorist Automatic Statistician Eve
  7. 1900s Robot AI Dendral BACON Adam AlphaFold AI Scientist MLAgent

    ChemCrow Coscientist MOOSE prompt2model ... Automated Theorem Proving SciML Physics Informed ML 2000s 2012 Laboratory Automation Scientific Workflow Program Synthesis Scholarly Document Processing Automated Experimental Design Literature Based Discovery Symbolic Regression ... Computer ML DNN この図は網羅的ではなく、分野や論文の選択には作成者の強い好み・主観が入っており、時系列も厳密ではない可能性がありますので、参考程度でお願いします Nobel Turing Challenge AI for Science 4thScience Curious Agent AI Feynman Geometric DL Galactica Bayes for Science Neural Operator ReviewRobot PaperRobot MLR-Copilot AlphaGeometry data2paper ... WINGS ... ... ChatGPT 2022 Scientific Claim Verifi. Mahoro Solevent SemNet ... DISK 3rdScience [Wang+ 2023] 2017 Transformer AutoML MLOps AM Logic Theorist Automatic Statistician Eve
  8. Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized

    Academic Assistance The emergence of Large Language Models (LLM) as a tool in literature reviews: an LLM automated systematic review Towards Automated Machine Learning Research Can Large Language Models Unlock Novel Scientific Research Ideas? Automating the Practice of Science -- Opportunities, Challenges, and Implications Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers Towards Fully Autonomous Research Powered by LLMs: Case Study on Simulations Automated Design of Agentic Systems CodeRefine: A Pipeline for Enhancing LLM-Generated Code Implementations of Research Papers MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning ....!!! 2024年8月以降 arXiv に投稿された研究 AI Scientist の登場以降、潮目が変わっている! 20
  9. ML研究は基本的な操作の組み合わせによる営みであり、この基本的操 作の能力が LLM の登場によって飛躍的に向上している LLMs & LLM Agents ... Reasoning

    Planning Thinking Tool Use Scholarly Document Processing Coding Computer Operation ... e.g. [Hou et al, 2023] e.g. [Huang et al, 2023, Huang et al, 2024, Mialon et al, 2023, ...] e.g. [Zhao et al, 2023, Wang 2023, ...] 22
  10. Objective Solution Implementation Experiment Plan Solution Idea Experiment Result Research

    Paper Experiment Implementation Research Problem アイデア生成/問いの生成/課題発見 25
  11. ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large

    Language Models [Baek+ 2024] Scideator: Human-LLM Scientific Idea Generation Grounded in Research-Paper Facet Recombination [Radensky+ 2024] OpenResearcher: Unleashing AI for Accelerated Scientific Research [Zheng+ 2024] Generation and human-expert evaluation of interesting research ideas using knowledge graphs and large language models [Gu & Krenn 2024] SCIMON : Scientific Inspiration Machines Optimized for Novelty [Wang+ 2023] AutoML-GPT: Automatic Machine Learning with GPT [Zhang+ 2023] Large Language Models for Automated Open-domain Scientific Hypotheses Discovery [Yang+ 2023] SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning [Ghafarollahi & Buehler 2024] Creative research question generation for human-computer interaction research [Liu+ 2023] Mapping the challenges of hci: An application and evaluation of chatgpt and gpt-4 for cost-efficient question answering [Oppenlaender & Hamalainen 2023] Evaluating the use of large language model in identifying top research questions in gastroenterology [Lahat+ 2023] ... and more !! [Baek+ 2024] 26
  12. 30

  13. # チャールズ・ダーウィンの進化論発展の詳細年表 ## 1809年 - 2月12日:チャールズ・ダーウィン、イングランドのシュルーズベリーで誕生 ## 1825-1827年 - エディンバラ大学で医学を学ぶが中退

    - 自然科学への興味を深める ## 1828-1831年 - ケンブリッジ大学クライスツ・カレッジに入学 - 植物学者ジョン・ヘンズローの指導を受け、生物学への関心が高まる ## 1831-1836年:ビーグル号航海 - 博物学者として世界周航に参加 - 重要な観察: 1. 南アメリカで絶滅した巨大哺乳類の化石を発見、現存種との類似性に注目 2. ガラパゴス諸島のフィンチの嘴の形状が島ごとに異なることを発見 3. ガラパゴスゾウガメの甲羅の形状の島ごとの違いを観察 - これらの観察が、種の可変性の考えの基礎となる ## 1837年 - 7月:最初の進化ノート(「赤いノート」)の執筆を開始 - 種の変化に関する初期の考えを記録 ## 1838年 - 9月:トマス・マルサスの「人口論」を読み、自然選択の概念を着想 - 生存競争の概念を発展させる https://www.kousakusha.co.jp/BOOK/ISBN978-4-87502-417-0.html ## 1842年 - 進化論の最初の概要を35ページの草稿にまとめる - 「自然選択」という用語を初めて使用 - 変異、遺伝、過剰生産、生存競争の概念を統合 ## 1844年 - より詳細な230ページの進化論草稿を執筆 - 自然選択による進化の過程をより詳細に説明 - 妻エマに草稿を託し、死後の公表を指示 ## 1846-1854年 - 主に蔓脚類(フジツボの仲間)の研究に従事 - 種の変異の範囲と遺伝の仕組みについての理解を深める ## 1854年 - 自然選択による進化の研究を本格的に再開 - 家畜育種の事例を収集し、人為選択と自然選択の類似性を探究 ## 1858年6月 - アルフレッド・ラッセル・ウォレスから類似の進化理論に関する論文を受け取る - 7月1日:リンネ協会でウォレスの論文とダーウィン自身の要約を共同発表 ## 1859年 - 11月24日:「種の起源」出版 人は一つのアイデアを生成するために多くの試行錯誤をしたり、様々な情報源からふとし た瞬間にアイデアを生成したりするが、そうしたアイデア生成はまだ ある目標を達成するための有用で具体的で深い研究レベルのアイデア生成はまだ ... など! 31
  14. Objective Solution Implementation Experiment Plan Solution Idea Experiment Result Research

    Paper Experiment Implementation Research Problem アイデア実現/手法提案 32
  15. 探索空間 手法提案/発見 手法評価 *[Hu+ 2024] を参考 Open-Ended な試みが増えてきている! 自動化の試みは昔からある LLM

    の能力向上によりオープンエンドな探索空間での手法の自動構 成/提案/発見の試みができるようになってきた! 38
  16. ??? Complicated/Concrete Idea Simple/Abstract Idea Brain-inspired AI AI model Inspired

    by visual information processing ... ??? Papers Mathematical Model ??? Code Implementation [Fukushima 1980] (CodeRefine: A Pipeline for Enhancing LLM-Generated Code Implementations of Research Papers [Trofimova+2024]) 単純なアイデアからオープンエンドに複雑な手法を発展するのはまだ 目的達成/課題解決するような複雑な手法の構成はまだ 39
  17. Objective Solution Implementation Experiment Plan Solution Idea Experiment Result Research

    Paper Experiment Implementation Research Problem 実験計画/実行 44
  18. [Li+ 2024] [Lu+ 2024] AI Scientist MLR-Copilot 人間が用意した実験/可視化 コードを AI

    が編集/実行し 質の高い実験結果を生成 実験テンプレートの取 得も含め実験コードの 作成/実行の自動化 59
  19. models datasets proposed_solution.py experiment.py experiment_plan.tx t # construct & compare

    methods # data & baseline preparation from modules import proposed_soulution # evaluation experiment results 実験計画を python ファイルにして ~するためのモデル とデータを選んで 前処理して class Algorithm: def train_model(): class NewAlgorithm(Algorithm) def train_model(): 60
  20. Objective Solution Implementation Experiment Plan Solution Idea Experiment Result Research

    Paper Experiment Implementation Research Problem 実験結果分析 63
  21. [Hong+ 2024] [Ifargan+ 2024] データ分析の自動化の試みは数多くあり、かなり複雑なパイプラインの提案もされ始めている DS-Agent: Automated Data Science by

    Empowering Large Language Models with Case-Based Reasoning [Guo+ 2024] JarviX: A LLM No code Platform for Tabular Data Analysis and Optimization [Liu+ 2023] Autonomous LLM-driven research from data to human-verifiable research papers [Ifargan+ 2024] Data Interpreter: An LLM Agent For Data Science [Hong+ 2024] Towards Automated Data Sciences with Natural Language and SageCopilot: Practices and Lessons Learned [Liao+ 2024] Towards Fully Autonomous Research Powered by LLMs: Case Study on Simulations [Liu+ 2024] ... 64
  22. Objective Solution Implementation Experiment Plan Solution Idea Experiment Result Research

    Paper Experiment Implementation Research Problem 研究過程全体 66
  23. [Li+ 2024] [Lu+ 2024] AI Scientist MLR-Copilot 論文執筆・論文査読も含 めて自動化 生成する論文の質が高い

    研究アイデアを論文群 から自動生成 実験テンプレートの取 得も含め実験コードの 作成も自動化 67
  24. 人間と同程度の自律性で人間と同程度の研究成果を出す AI はまだ Tool → AI Scientist というのは自律性の話 どの程度人の介入/設計無しで研究できるか? 人は自ら目的設定する/実験結果からやること

    を変更する/数年かかる研究に取り組む... (long-horizon, open-ended, adaptive, ...) 実際やることは基礎技術の発展とそれを最大 活用する研究過程のフローエンジニアリング [Huang+ 2024] [Lu+ 2024] [高橋恒一「万能知能と科学の主観性と自律 性、およびその変質」@科学基礎論学会 2024] 75
  25. DENDRAL: a case study of the first expert system for

    scientific hypothesis formation [Lindsay+ 1993] Scientific discovery: Computational explorations of the creative processes [Langley+ 1987] Curious Model-Building Control Systems [Schmidhuber 1991] Automated theory formation in mathematics [Lenat 1977] The logic theory machine: A complex information processing system [Newell & Simon 1956] Functional genomic hypothesis generation and experimentation by a robot scientist [King+ 2004] Cheaper faster drug development validated by the repositioning of drugs against neglected tropical diseases [Williams+ 2015] A semantic framework for automatic generation of computational workflows using distributed data and component catalogs [Gill+ 2011] The Automatic Statistician [Steinruecken+ 2019] Towards continuous scientific data anal- ysis and hypothesis evolution [Gil+ 2017] Robotic crowd biology with maholo labdroids [Yachie & Natsume 2017] SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers [Chan+ 2018] Attention Is All You Need [Vaswani+ 2017] AI Feynman: a Physics-Inspired Method for Symbolic Regression [Udrescu& Tegmark 2019] PaperRobot: Incremental Draft Generation of Scientific Ideas [Wang+ 2019] Predicting research trends with semantic and neural networks with an application in quantum physics [Krenn & Zeilinger 2019] ReviewRobot: Explainable Paper Review Generation based on Knowledge Synthesis [Wang+ 2020] Improved protein structure prediction using potentials from deep learning [Senior+ 2020] Galactica: A Large Language Model for Science [Taylor+ 2022] MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation [Hunag+ 2023] ChemCrow: Augmenting large-language models with chemistry tools [Bran+ 2023] Autonomous chemical research with large language models [Boiko+ 2023] Large Language Models for Automated Open-domain Scientific Hypotheses Discovery [Yang+ 2023] Prompt2Model: Generating Deployable Models from Natural Language Instructions [Viswanathan+ 2023] The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery [Lu+ 2024] MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents [Li+ 2024] Solving olympiad geometry without human demonstrations [Trinh+ 2024] Autonomous LLM-driven research from data to human-verifiable research papers [Ifargan+ 2024]
  26. Scientific discovery in the age of artificial intelligence [Wang+ 2023]

    Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems [Xhang+ 2023] Automating science [Waltz 2009] Nobel Turing Challenge: creating the engine for scientific discovery [Kitano 2021] Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance [Lin+ 2024] The emergence of Large Language Models (LLM) as a tool in literature reviews: an LLM automated systematic review [Scherbakov+ 2024] Towards Automated Machine Learning Research [Ardeshir+ 2024] Can Large Language Models Unlock Novel Scientific Research Ideas? [Kumar+ 2024] Automating the Practice of Science -- Opportunities, Challenges, and Implications [Musslick+ 2024] Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers [Si+ 2024] Towards Fully Autonomous Research Powered by LLMs: Case Study on Simulations [Liu+ 2024] Automated Design of Agentic Systems [Hu+ 2024] CodeRefine: A Pipeline for Enhancing LLM-Generated Code Implementations of Research Papers [Trofimova 2024] SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning [Ghafarollahi+ 2024] Survey of Large Language Models [Zhao+ 2023] A Survey on Large Language Model based Autonomous Agents [Wang+ 2023] Towards Reasoning in Large Language Models: A Survey [Huang+ 2023] Augmented Language Models: a Survey [Mialon+ 2023] Large Language Models for Software Engineering: A Systematic Literature Review [Hou+ 2023] Understanding the planning of LLM agents: A survey [Huang+ 2024] ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [Baek+ 2024] Scideator: Human-LLM Scientific Idea Generation Grounded in Research-Paper Facet Recombination [Radensky+ 2024] OpenResearcher: Unleashing AI for Accelerated Scientific Research [Zheng+ 2024] Generation and human-expert evaluation of interesting research ideas using knowledge graphs and large language models [Gu & Krenn 2024] SCIMON : Scientific Inspiration Machines Optimized for Novelty [Wang+ 2023] AutoML-GPT: Automatic Machine Learning with GPT [Zhang+ 2023] Large Language Models for Automated Open-domain Scientific Hypotheses Discovery [Yang+ 2023]
  27. SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning [Ghafarollahi

    & Buehler 2024] Creative research question generation for human-computer interaction research [Liu+ 2023] Mapping the challenges of hci: An application and evaluation of chatgpt and gpt-4 for cost-efficient question answering [Oppenlaender & Hamalainen 2023] Evaluating the use of large language model in identifying top research questions in gastroenterology [Lahat+ 2023] Neural Architecture Search: Insights from 1000 Papers [White+ 2023] Symbolic Discovery of Optimization Algorithms [Chen+ 2023] Discovering Preference Optimization Algorithms with and for Large Language Models [Lu+ 2024] Fukushima, 1980, Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position AutoML-GPT: Automatic Machine Learning with GPT [Zhang+ 2023] DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning [Guo+ 2024] JarviX: A LLM No code Platform for Tabular Data Analysis and Optimization [Liu+ 2023] Autonomous LLM-driven research from data to human-verifiable research papers [Ifargan+ 2024] Data Interpreter: An LLM Agent For Data Science [Hong+ 2024] Towards Automated Data Sciences with Natural Language and SageCopilot: Practices and Lessons Learned [Liao+ 2024] Towards Fully Autonomous Research Powered by LLMs: Case Study on Simulations [Liu+ 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments [Xie+ 2024] The future of fundamental science led by generative closed-loop artificial intelligence [Zenil+ 2023] Speculative Exploration on the Concept of Artificial Agents Conducting Autonomous Research [Takagi 2023] Collective Predictive Coding as Model of Science: Formalizing Scientific Activities Towards Generative Science [Taniguchi+ 2024] Evolutionary principles in self-referential learning [Schmidhuber 1987]