帰納と演繹の間を求めて：記号と離散構造の統計的機械学習

[email protected] https://itakigawa.github.io/

2 = ( ) ⾒⾒

3 ( ) • AlphaGeometry (Nature, 2024) • AlphaCode2 (2023)
& AlphaCode (Science, 2022) (Codeforces) 85% • FunSearch (Nature, 2023) ⾒ (extremal combinatorics) Cap set • AlphaDev (Nature, 2023) 3 5 70 >25 ⾒ 1.7 LLVM libstdc++ • AlphaTensor (Nature, 2022) 50 2 4×4 49 Strassen 47

4 ( ) Natue News https://bit.ly/3UQcgpu Stanford University’s 2024 AI
Index https://aiindex.stanford.edu/report/

5 ( ) CODE-COMPLETION SYSTEMS OFFERING suggestions to a developer
in their integrated development environment (IDE) have become the most frequently used kind of programmer assistance.1 When generating whole snippets of code, they typically use a large language model (LLM) to predict what the user might type next (the completion) from the context of what they are working on at the moment (the prompt).2 This system allows for completions at any position in Measuring GitHub Copilot’s Impact on Productivity DOI:10.1145/3633453 Case study asks Copilot users about its impact on their productivity, and seeks to find their perceptions mirrored in user data. BY ALBERT ZIEGLER, EIRINI KALLIAMVAKOU, X. ALICE LI, ANDREW RICE, DEVON RIFKIN, SHAWN SIMISTER, GANESH SITTAMPALAM, AND EDWARD AFTANDILIAN key insights AI pair-programming tools such as GitHub Copilot have a big impact on developer productivity. This holds for developers of all skill levels, with junior developers seeing the largest gains. The reported benefits of receiving AI suggestions while coding span the full range of typically investigated aspects of productivity, such as task time, product quality, cognitive load, enjoyment, and learning. Perceived productivity gains are reflected in objective measurements of developer activity. While suggestion correctness is important, the driving factor for these improvements appears to be not correctness as such, but whether the suggestions are useful as a starting point for further development. 54 COMMUNICATIONS OF THE ACM | MARCH 2024 | VOL. 67 | NO. 3 research the code, often spanning multiple lines at once. Potential benefits of generating large sections of code automatically are huge, but evaluating these systems is challenging. Offline evaluation, where the system is shown a par- tial snippet of code and then asked to complete it, is difficult not least because for longer completions there are many acceptable alternatives and no straightforward mechanism for labeling them automatically.5 An ad- ditional step taken by some research- ers3,21,29 is to use online evaluation and track the frequency of real users accepting suggestions, assuming that the more contributions a system makes to the developer’s code, the higher its benefit. The validity of this assumption is not obvious when con- sidering issues such as whether two short completions are more valuable than one long one, or whether review- ing suggestions can be detrimental to programming flow. Code completion in IDEs using language models was first proposed in Hindle et al.,9 and today neural syn- thesis tools such as GitHub Copilot, CodeWhisperer, and TabNine suggest code snippets within an IDE with the explicitly stated intention to increase a user’s productivity. Developer productivity has many aspects, and a re- cent study has shown that tools like these are helpful in ways that are only partially reflected by measures such as completion times for standardized tasks.23,a Alternatively, we can leverage the developers themselves as expert assessors of their own productivity. This meshes well with current thinking in software engineering research suggesting measuring productivity on multiple dimensions and using self-reported data.6 Thus, we focus on studying perceived productivity. Here, we investigate whether usage measurements of developer interac- tions with GitHub Copilot can predict perceived productivity as reported by developers. We analyze 2,631 sur- a Nevertheless, such completion times are greatly reduced in many settings, often by more than half.16 MARCH 2024 | VOL. 67 | NO. 3 | COMMUNICATIONS OF THE ACM 55 ILLUSTRATION BY JUSTIN METZ • • • (explain) • (brushes) • • A B • • GitHub Copilot ( ) Comm. ACM. 67(3) https://doi.org/10.1145/3633453 GitHub Next / Copilot Labs https://githubnext.com/

6 ChatGPT ( ) ? https://arxiv.org/abs/2311.17035 IQ ? https://bit.ly/44qxNYQ

7 LLM ( ) https://bit.ly/44Og5z3 ( ) https://arxiv.org/abs/2309.12288

8 (Deductive) (Inductive) (Hypothetico-Deductive) / Retroductive (Abductive) (System 1 +
System 2)

9 • Q. ( ) ( ) vs ( )
• • • ( , RAG, ReACT, LangChain, ) • ( ) • × ( ) •

10 • ( ) • ( ) • ( )
• ( ) • BHK ( ) ( ) ? (2015)

11 (2023 ) ACM A.M. Turing Award Honors Avi Wigderson
for Foundational Contributions to the Theory of Computation https://awards.acm.org/about/2023-turing For Turing Award winner, everything is computation and some problems are unsolvable https://bit.ly/3xZzJLP

12 = / / • (Church, Kleene, Turing) • /μ
(Gödel-Herbrand) • (Church) • (Turing) • • ( ) vs. (2021) (2016)

13 A. ( ) B. ( ) ( ) •
𝜃 ( ) ℱ = {𝑓 𝜃 ; 𝜃 ∈ Θ} • 𝜃

14 • • (Memorization) Over t ( ) <latexit sha1_base64="b32IJf3miPJfhFt+JgiJN/Kwon8=">AAAB93icbVA9SwNBEN2LXzF+RS1tFoNgFe5EomXQxjIB8wHJEfY2c8mS3b1jd088jvwCW63txNafY+k/cZNcYRIfDDzem2FmXhBzpo3rfjuFjc2t7Z3ibmlv/+DwqHx80tZRoii0aMQj1Q2IBs4ktAwzHLqxAiICDp1gcj/zO0+gNIvko0lj8AUZSRYySoyVms+DcsWtunPgdeLlpIJyNAbln/4wookAaSgnWvc8NzZ+RpRhlMO01E80xIROyAh6lkoiQPvZ/NApvrDKEIeRsiUNnqt/JzIitE5FYDsFMWO96s3E/7xeYsJbP2MyTgxIulgUJhybCM++xkOmgBqeWkKoYvZWTMdEEWpsNktbAjG1mXirCayT9lXVq1VrzetK/S5Pp4jO0Dm6RB66QXX0gBqohSgC9IJe0ZuTOu/Oh/O5aC04+cwpWoLz9QuO25Od</latexit>
x <latexit sha1_base64="nUdlwggp4HAu9pQZGCwj2BrIfvg=">AAAB93icbVA9TwJBEJ3DL8Qv1NJmIzGxInfGoCXRxhISQRK4kL1lDjbs7V1290wuhF9gq7WdsfXnWPpPXOAKAV8yyct7M5mZFySCa+O6305hY3Nre6e4W9rbPzg8Kh+ftHWcKoYtFotYdQKqUXCJLcONwE6ikEaBwKdgfD/zn55RaR7LR5Ml6Ed0KHnIGTVWamb9csWtunOQdeLlpAI5Gv3yT28QszRCaZigWnc9NzH+hCrDmcBpqZdqTCgb0yF2LZU0Qu1P5odOyYVVBiSMlS1pyFz9OzGhkdZZFNjOiJqRXvVm4n9eNzXhrT/hMkkNSrZYFKaCmJjMviYDrpAZkVlCmeL2VsJGVFFmbDZLW4JoajPxVhNYJ+2rqler1prXlfpdnk4RzuAcLsGDG6jDAzSgBQwQXuAV3pzMeXc+nM9Fa8HJZ05hCc7XL5Buk54=</latexit> y <latexit sha1_base64="b32IJf3miPJfhFt+JgiJN/Kwon8=">AAAB93icbVA9SwNBEN2LXzF+RS1tFoNgFe5EomXQxjIB8wHJEfY2c8mS3b1jd088jvwCW63txNafY+k/cZNcYRIfDDzem2FmXhBzpo3rfjuFjc2t7Z3ibmlv/+DwqHx80tZRoii0aMQj1Q2IBs4ktAwzHLqxAiICDp1gcj/zO0+gNIvko0lj8AUZSRYySoyVms+DcsWtunPgdeLlpIJyNAbln/4wookAaSgnWvc8NzZ+RpRhlMO01E80xIROyAh6lkoiQPvZ/NApvrDKEIeRsiUNnqt/JzIitE5FYDsFMWO96s3E/7xeYsJbP2MyTgxIulgUJhybCM++xkOmgBqeWkKoYvZWTMdEEWpsNktbAjG1mXirCayT9lXVq1VrzetK/S5Pp4jO0Dm6RB66QXX0gBqohSgC9IJe0ZuTOu/Oh/O5aC04+cwpWoLz9QuO25Od</latexit> x <latexit sha1_base64="nUdlwggp4HAu9pQZGCwj2BrIfvg=">AAAB93icbVA9TwJBEJ3DL8Qv1NJmIzGxInfGoCXRxhISQRK4kL1lDjbs7V1290wuhF9gq7WdsfXnWPpPXOAKAV8yyct7M5mZFySCa+O6305hY3Nre6e4W9rbPzg8Kh+ftHWcKoYtFotYdQKqUXCJLcONwE6ikEaBwKdgfD/zn55RaR7LR5Ml6Ed0KHnIGTVWamb9csWtunOQdeLlpAI5Gv3yT28QszRCaZigWnc9NzH+hCrDmcBpqZdqTCgb0yF2LZU0Qu1P5odOyYVVBiSMlS1pyFz9OzGhkdZZFNjOiJqRXvVm4n9eNzXhrT/hMkkNSrZYFKaCmJjMviYDrpAZkVlCmeL2VsJGVFFmbDZLW4JoajPxVhNYJ+2rqler1prXlfpdnk4RzuAcLsGDG6jDAzSgBQwQXuAV3pzMeXc+nM9Fa8HJZ05hCc7XL5Buk54=</latexit> y

15 1 • B A 1. 2. • (Cybenko, Hornik,
etc) 3 1 1 1 3 • ( ) (Vapnik & Chervonenkis, etc) • (Computable Analysis)

16 2 • or (Universal Prediction): • ( ) n
(2-n) • • 1960 • ( ) ⾒ https://www.lesswrong.com/tag/solomonoff-induction R. J. Solomonoff, Machine Learning — Past and Future (2009). Dartmouth Artificial Intelligence Conference: The Next Fifty Years (AI@50), Dartmouth, 2006. https://bit.ly/3UvpMgG

17 • Q. ( ) ( ) vs ( )

18 ⚠ • ( ) • • (Melanie Mitchell) •
AI hype is built on high test scores. Those tests are flawed. https://bit.ly/44m7NxX • AI now beats humans at basic tasks — new benchmarks are needed, says major report https://www.nature.com/articles/d41586-024-01087-4 • ( “ ” )

19 (LLMs) 頻 ( ) GPT-4, Claude 3, Gemini 1.5,
Mistral, Llama 3, Command, DBRX, PPLX, Grok, https://artificialanalysis.ai/

Me > ChatGPT4 > 1. 1944 2. 1966 3. 1971
4. 1971 5. 1981 6. 1990 7. 2007 ü : : ü ü ü ( )

Me > ⾒ Web ChatGPT4 > Ichigaku Takigawa ICReDD ⾒
⾒ d

Claude 3 Opus > • • • • • •
AC0 PARITY MAJORITY • TC0 W(S5) • • Transformers as recognizers of formal languages: A survey on expressivity. L Strobl, W Merrill, G Weiss, D Chiang, D Angluin https://arxiv.org/abs/2311.00208

Me > URL Web 10 https://www.lifeiscomputation.com/transformers-are-not-turing-complete/ ChatGPT4 > Web 1.
: : 2. : : 3. : : 4. : : 5. : : 6. : : 7. : : 8. : : 9. : : 10. : : [ ] ü ü

24 • ( ) • • WWW • • (?)

25 Microsoft • OpenAI GPT AI Word, Excel, PowerPoint, Outlook,
Teams, Bing/Edge, Azure, GitHub, VSCode, • GitHub (2018) VSCode GitHub Copilot / Copilot Workspace • 3 (440 ) 3 Apple 2 (🇬🇧🇫🇷🇮🇹 GDP ) • 頻 Windows Microsoft 365 Copilot • Word PowerPoint • Excel • Web

26 • • • ⾒ • • • • •
1cm 2 •

27 1. ( ) ( ) Embedding ( ) 2.
( ) • Vision-Language-Action (VLA) “computer” → [-0.005, 0.014, -0.007 , ..., -0.015]

28 OpenAI embedding models • text-embedding-ada-002 • text-embedding-3-small • text-embedding-3-large
API KEY OK

29 ⾒ ≑ ⾒ • ( ): ⾒⾒ (
) ⾒⾒ • : (= ) Wikipedia 2014 + Gigaword 5 40 ( ) OpenAI embeddings (large) 10 ※ GloVe: Global Vectors for Word Representation https://nlp.stanford.edu/projects/glove/ king – man + woman 0 king 0.690340 1 woman 0.569618 2 queen 0.521278 3 königin 0.479320 4 queene 0.477531 5 koningin 0.468316 6 women 0.460638 7 konig 0.459005 8 queenie 0.458508 9 queeny 0.455897 france – paris + london 0 london 0.682005 1 france 0.643014 2 england 0.563394 3 london-based 0.554125 4 longdon 0.539819 5 londen 0.532294 6 london-born 0.527847 7 londons 0.512141 8 londin 0.507284 9 britain 0.494870 computation 0 computation 1.000000 1 computations 0.863177 2 computing 0.773753 3 computational 0.728108 4 computationally 0.710972 5 computes 0.702328 6 computability 0.678299 7 calculation 0.647723 8 compute 0.637001 9 computable 0.612179 impeccable 0 impeccable 1.000000 1 impeccably 0.852758 2 unimpeachable 0.769426 3 irreproachable 0.725752 4 immaculate 0.716498 5 immaculately 0.665226 6 flawless 0.656289 7 faultless 0.633298 8 unblemished 0.628034 9 spotless 0.622993

30 • ( ) • , , , , ,
, , , , (Character Level) • , , , , (Word Level) • ⾒ (Subword Level) ( + ) • Character/Byte Level (BPE) ( RePair) Byte fallback

31 1/2 • Causal LM = Next Token Prediction (
) • : n : : I'm ne, thank you (9.99e-01) for (1.09e-04) goodness (7.48e-05) • Meta-Llama-3-8B-Instruct ⾒頻8192 4096 128256 • n = 8192 ( / ) • vocab size = 128256 ( ) • embedding dim = 4096 ( ) • n : I'm ne, thank for asking. Just a little tired from the trip. I'll be okay

32 2/2 = Transformer (Llama3 ) ( 8192) (4096) Transformer
Decoder ❌ ❌ ❌ ❌ / (4096) (embedding) Everything is a remix https://www.everythingisaremix.info/

33 2/2 = Transformer (Llama3 ) ( 8192) (4096) Transformer
Decoder … (128256) Transformer Decoder 32 RMSNorm →Linear Multihead Attention RMSNorm + (MLPs) RMSNorm + ❌ ❌ SiLU × y = w2(silu(w1(x)) * w3(x)) Scaled Dot Product Attention 𝒗! 𝒗" 𝒗# 𝒗$ 𝒗% 𝒐! = # "#$ % 𝛼!," 𝒗" 𝒐! 𝒐" 𝒐# 𝒐$ 𝒐% Linear 𝛼#,% 𝒒! 𝒌% 𝒒" 𝒒# 𝒒$ 𝒒% 𝒌$ 𝒌# 𝒌" 𝒌! 𝒒& ′𝒌' dim of 𝒌' Softmax RoPE

from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id) llama3_lm = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", ) print(llama3_lm); print(llama3_lm.model.config) LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(128256, 4096) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaSdpaAttention( (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (k_proj): Linear(in_features=4096, out_features=1024, bias=False) (v_proj): Linear(in_features=4096, out_features=1024, bias=False) (o_proj): Linear(in_features=4096, out_features=4096, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=4096, out_features=14336, bias=False) (up_proj): Linear(in_features=4096, out_features=14336, bias=False) (down_proj): Linear(in_features=14336, out_features=4096, bias=False) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() ) (lm_head): Linear(in_features=4096, out_features=128256, bias=False) )

35 ( ) • ( ) Embedding ( ) •
Llama3 15 ( ) • • ( ) 頻 • memorization

36 • ( ) ) GPT-4 ChatGPT • Instruction Tuning
( Fine Tuning) ( ) Fine Tuning • Human Feedback (RLHF) • • • Fine Tuning https://bit.ly/4bloMTk

37 AI ( ) 1950 1960 1970 1980 1990 2000
2010 2020 2 3 ( ) , , Level-1 Level-2 × Level-3 AI(GOFAI) !?

38 Level-1 • (Llama3 15 ) ( etc) • •
(Fine Tuning) • RAG ( ) ReACT( ) ( VectorDB ) ) Python • ( ) ( Claude3 200K, Gemini 1.5 Pro 128K)

39 Level-1 LangChain LLM https://udemy.benesse.co.jp/development/system/langchain.html

40 Level-2 ( ) Tree of Thoughts/ToT (Yao et el.
2023, Long 2023) • • ⾒ •

41 AlphaCode2 • AlphaCode2 (2023) & AlphaCode (Science, 2022) (Codeforces)
85% • https://goo.gle/AlphaCode2 https://bit.ly/3UvYg2T https://bit.ly/3JLXhGD

42 AlphaCode2 ü Gemini Pro CodeContests Fine Tuning ü Tune
Tuning ü 100 ü 95% ü ü Fine Tuning Gemini Pro 10

43 AlphaGeometry • AlphaGeometry (Nature, 2024) • Nature 625, 476
482 (2024). https://doi.org/10.1038/s41586-023-06747-5 https://bit.ly/4dkSKc1

44 AlphaGeometry ü ü ü LLM ü ü

45 AlphaGeometry

46 AlphaGeometry 10

47 AlphaGeometry

48 FunSearch • FunSearch (Nature, 2023) ⾒ (extremal combinatorics) Cap
set • Nature 625, 468 475 (2024). https://doi.org/10.1038/s41586-023-06924-6 https://bit.ly/3WprEdx

49 FunSearch ü LLM ü FunSearch Fun ( ) ü
Google LLM PaLM 2 Codey 蓄 Fine Tuning ü ü ü k ⾒

50 FunSearch evaluate

51 • Q. ( ) ( ) vs ( )

52 Level-3 × • Q. ( ) • • or
• • • / RAG • ReACT • •

53 • keras https://keras.io/examples/nlp/addition_rnn/ • "535+61" "596" • LSTM (1997)
Folklore Sequence to Sequence Learning with Neural Networks (2014) https://arxiv.org/abs/1409.3215 Learning to Execute (2014) https://arxiv.org/abs/1410.4615 • +/- Reversed Karpathy s minGPT demo https://github.com/karpathy/minGPT/blob/master/projects/adder/adder.py • (2 +2 ) • Onehot +1 LSTM 99% (Transformer ) • memorization ( data leakage )

54 CS • • • ( ) ( ) •
• • ( ) • ( ) • CS

55 1 • 1cm 2 ( ) / • (
) , ( ) • • ( ) 2 ? https://bit.ly/4b9mcjz ( ) • 1000 1001 / ( )

56 2 • ( ) • ( ) ( or
) ( ) ( ) • • ? • ⾒ ?

57 ⾒ • ⾒ ( ) • ⾒⾒⾒
• • Underdetermined/Underspecified 頻 • ( ) • • ? ( ), ( )

58 Level-3 × • : Thinking slow (System 2) and
fast (System 1) • • ⾒ • ⾒System 2( ) OK • • System2 • × https://arxiv.org/html/2401.14953v1 • LLM

59 1. • Transformer Stateless • 2. • Transformer Memoryless
( ?) • ( ) 3. ( Simulate Feedback) • • • / Optional

60 / Feedback • / • Self-Refine https://arxiv.org/abs/2303.17651 • Looped
Transformer https://arxiv.org/abs/2301.13196 • The ConceptARC Benchmark https://arxiv.org/abs/2305.07141 • (e.g. ) • Go-Explore https://doi.org/10.1038/s41586-020-03157-9 Atari57 Montezuma s Revenge and Pitfall • Dreamer V3 https://arxiv.org/abs/2301.04104 • MuZero https://doi.org/10.1038/s41586-020-03051-4 , https://www.repre.org/repre/vol45/special/nobuhara/

61 • ( × ) AlphaCode2, AlphaGeometry, FunSearch • NN
( etc) • Mixture of Experts (MoE) System 2 System 1 • • Tokenization

62 Q. ( ) A. , • ( ) •
( ) By Design ( ) • ⾒ ( Transformer )

• ( ), . ( ) X- informatics , 2023
2 18 • ( ), ChatGPT . 2023, , 2023 8 25 • ( ), AI ChatGPT . AI , 2023 9 14 • (NTT CS ), Neuro-Symbolic AI . 2024 3 27 17 AFSA • ( ), . 18 AFSA 2024 4 24 • Private communications: , ( ), (NII) https://afsa.jp/

64 • • Great • May the ML Force be
with you… PDF https://itakigawa.github.io/data/comp2024.pdf https://bit.ly/4bnMX3m

• . (2), , 93(12), 2023. https://bit.ly/3UNCvN8 • . ,
77(10), 2023. https://bit.ly/3Wrjwtf • . , 64(12), 2023 https://bit.ly/3UMjAm4 • , , 49(1), 2022. https://bit.ly/3Quy3Aj • ⾒ , 70(3), 2022. https://bit.ly/3QyNOXa • (FPAI). , 34(5), 2019. https://bit.ly/3Qz8fTK • , . , 2018. • , vs. . , 2021. • , ? . , 2015. • , 11 20 , , 2007. • , . , 2016. • , . , 2012. • , . , 1989. • W G , : . , 2005. • , 1912~1951. , 2003. • , . , 2014. • , ⾒ . , 2016. • , & ? , 2014.

QA + • Q. tokenization Byte-level subword-level Char/Byte-level Byte fallback
Byte • Q. System 2 System 1 System 1 1000 1001 ( ) • Q. ⾒ • 1 Transformer • 2 ( ) ( ) ( ) ( ) ( ) ( ) • 3

帰納と演繹の間を求めて：記号と離散構造の統計的機械学習

帰納と演繹の間を求めて：記号と離散構造の統計的機械学習

More Decks by itakigawa

Featured

Transcript