Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Does the Efficient Compute Frontier Represent N...

Dr. Neil Gunther
March 14, 2025
44

Does the Efficient Compute Frontier Represent New Physics?

The so-called "Efficient Compute Frontier" (ECF) refers to an apparent hard constraint on the achievable error reduction as a function of the amount of computational work incurred when processing pre-training data for LLMs (large language models). The Artificial Intelligence (AI) community has questioned if this previously unknown and unexpected constraint represents some kind of fundamental law of nature.

We present a model of LLM neural-network dynamics that exhibits power-law behavior and matches the ECF constraint, C_{min}(N) = a N^(-b). The prefactor a = 0.00000001 sets the scale of the neural-network connections, viz., on the order of billions, while the exponent b = 0.05 is indicative of subnetwork correlations that are much stronger than Zipf's law. In this way, we are able to answer the original question in the negative.

Dr. Neil Gunther

March 14, 2025
Tweet

Transcript

  1. Does the Efficient Compute Frontier Represent New Physics? Instantons in

    the Machine Dr. Neil J. Gunther Performance Dynamics Research APS Global Physics Summit Anaheim, California March 17, 2025 © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 1 / 21
  2. Abstract The so-called “Efficient Compute Frontier” (ECF) refers to an

    apparent hard con- straint on the achievable error reduction as a function of the amount of computational work incurred when processing pre-training data for LLMs (large language models). The Artificial Intelligence (AI) community has questioned if this previously unknown and unexpected constraint represents some kind of fundamental law of nature. We present a model of LLM neural-network dynamics that exhibits power- law behavior and matches the ECF constraint, Cmin(N) = aN−b. The prefactor a = 0.00000001 sets the scale of the neural-network connections, viz., on the order of billions, while the exponent b = 0.05 is indicative of subnetwork correlations that are much stronger than Zipf’s law. In this way, we are able to answer the original question in the negative. Our result notwithstanding, and given that the 2024 Nobel Prize in Physics was shared by an AI researcher, this burgeoning area of Generative AI (and possibly re- lated areas) would seem to offer fertile ground for interdisciplinary physics. © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 2 / 21
  3. What is the ECF? Outline 1 What is the ECF?

    2 Computer Metastability 3 Instantons in the Machine 4 The Final Frontier 5 Conclusion 6 References © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 3 / 21
  4. What is the ECF? What is GPT? Definition 1 (Abbreviation

    — But it works backwards) 1 Generative: capable of producing new objects (text, images, etc.) 2 Pre-trained: large input dataset where the outcome is known 3 Transformer: massively parallel feedforward multi-layer neural network [1] GPT LLM identifies patterns of unique text tokens executed on special hardware (GPUs). NN connection weights are iteratively evaluated using multivariate regression on steroids. My focus here is on pre-training performance of OpenAI’s GPT version 3 [2]. © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 4 / 21
  5. What is the ECF? The Efficient Compute Frontier (ECF) Figure

    1: Error (reduction) of pre-training GPT-3 curves as a function of compute (time) [2]. ECF is the power-law asymptote or barrier (dashed line). Inset shows the same curves on log-linear axes where the intrinsic sigmoid characteristic is manifest. © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 5 / 21
  6. What is the ECF? ECF is Ubiquitous Figure 2: Power

    law ECF observed across different performance metrics. LLM models where “large” means ∼150 × 109 GPT-3 params (NN connections) [2] Lower error or “loss” means better text sequence prediction Successive pre-training curves ≡ increasing LLM params by many millions each time Compute time is measured in logarithmic days ... PF-days in Fig. 2(a) Asymptotic power law [3, 4] is present across different LLM metrics (Fig.2) “We don’t know why?” [5] “Is this some new fundamental law of nature?” [6] © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 6 / 21
  7. Computer Metastability Outline 1 What is the ECF? 2 Computer

    Metastability 3 Instantons in the Machine 4 The Final Frontier 5 Conclusion 6 References © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 7 / 21
  8. Computer Metastability Metastability in Computer Systems Out[]= Figure 3: Mean

    arrival rate (red curve) into queue and departure rate (blue) from queue (inset) with mean length N. Queue servicing (blue disk) has (N − 1) waiting requests (red blocks). If service time in Fig. 3, increases with N (amount of compute work), defines a load-dependent queue [7, 8]. LD queue models dominant stochastic behavior of complex systems [8, 9, 10]. Stability conditions: 1 Nopt — stable optimal queue (left) 2 Ncrit — unstable queue length (center) 3 Nslow — stable congested queue (right) © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 8 / 21
  9. Computer Metastability Figure 4: (a) Linear arrival rate. Nonlinear departure

    rate (throughput) defined analytically by the Universal Scalability Law (USL) [10, 11, 12]. (b) Difference of the curves in Fig.4(a). (c) Integral of the drift function in Fig.4(b). Stability points become local minimum (left), central maximum and global minimum (right) [13]. (cf. Asymmetric double-well potential in QM tunneling) Stochastic queue stability is conveniently visualized as a Brownian particle in R1, initially fluctuating in the upper valley of (c). If it reaches the central peak, it may or may not come back. If it reaches the lower valley, it will take a long time to come back. Congested queue is more stable. [13, 14, 15, 16] Lowest cost in Fig. 4(c) but sub-optimal performance. Q: Where is the Brownian particle located? © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 9 / 21
  10. Computer Metastability Figure 4: (a) Linear arrival rate. Nonlinear departure

    rate (throughput) defined analytically by the Universal Scalability Law (USL) [10, 11, 12]. (b) Difference of the curves in Fig.4(a). (c) Integral of the drift function in Fig.4(b). Stability points become local minimum (left), central maximum and global minimum (right) [13]. (cf. Asymmetric double-well potential in QM tunneling) Stochastic queue stability is conveniently visualized as a Brownian particle in R1, initially fluctuating in the upper valley of (c). If it reaches the central peak, it may or may not come back. If it reaches the lower valley, it will take a long time to come back. Congested queue is more stable. [13, 14, 15, 16] Lowest cost in Fig. 4(c) but sub-optimal performance. Q: Where is the Brownian particle located? A: It’s glued onto the tail of the queue. © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 9 / 21
  11. Instantons in the Machine Outline 1 What is the ECF?

    2 Computer Metastability 3 Instantons in the Machine 4 The Final Frontier 5 Conclusion 6 References © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 10 / 21
  12. Instantons in the Machine USL Cost Function in 3D Recall

    metastable Brownian particle cost function C(N) from Fig. 4(c) A ball bearing can roll from local minimum (left) to global minimum (right) It’s the classical path of least action [13, 14] during some period ∆t in R3 A single giant fluctuation or large deviation in the queueing model [17] Ball-bearing path N(t) over the hump is the instanton (Fig. 5) [18, 19] Out[]= 50 100 150 200 N C(N) Out[]= Figure 5: USL cost function C(N) and instanton path N(t) in 3D (blue dots). © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 11 / 21
  13. Instantons in the Machine Instanton Solution in 3D Out[]= Figure

    6: Fig.5 oriented to match the LLM loss-compute curves in Fig. 1. No analytic instanton solution for C(N) in Fig. 5 Instanton N(t) is computed numerically in Fig. 6 N(t) is the number of unprocessed LLM tokens For some value of “instant” ... day-decades in Fig. 1 (depends on hardware config) © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 12 / 21
  14. The Final Frontier Outline 1 What is the ECF? 2

    Computer Metastability 3 Instantons in the Machine 4 The Final Frontier 5 Conclusion 6 References © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 13 / 21
  15. The Final Frontier USL Compute Frontier Out[]= 10 100 1000

    N 2 3 4 5 C(N) Figure 7: Selected successive global USL cost minima (curves)—each rescaled by an order of magnitude—lie on the power-law bound Cmin(N) = 0.00000001 N−0.05 (dashed line). Inset: cf. Arbitrary LLM instanton error curves from Fig. 1. © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 14 / 21
  16. Conclusion Outline 1 What is the ECF? 2 Computer Metastability

    3 Instantons in the Machine 4 The Final Frontier 5 Conclusion 6 References © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 15 / 21
  17. Conclusion Conclusions Theorem 2 (Instanton Trajectories) The sigmoidal loss-compute curves

    are the instanton trajectories, N(t), belonging to successively larger metastable LLM models. Instantons in 1 Physical systems: QM tunneling [19], QFT vacuum decay [19, 20], binary interface between thermodynamic phases [20]. 2 Computer systems: Internet collapse [21, 13], virtual memory thrashing [13], packet radio degradation [14, 22, 23] ... Not unique to LLMs. Theorem 3 (Efficient Compute Frontier) The power-law lower bound (ECF) Cmin(N) = aN−b (1) is defined by the depth of the global minimum in the USL cost (error) function, C(N), of successively larger metastable LLM models. © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 16 / 21
  18. Conclusion But, there’s a twist !! Theorem 4 (Congestion is

    Correlation) Unlike conventional computer systems, where a long queue represents performance degradation (long stable waiting line), the long LLM queue is associated with strong correlations between the location of tokens in the queue. [24] Strongly correlated “congestion” is the optimum for LLMs (unlike computers). Recomputing “congested” tokens does not lower error rate at global USL minimum. Why is the ECF exponent b = 0.05 much smaller than Zipf’s law exponent b = 1? Theorem 5 (Zipf’s Law) Exponent b ≪ 1 implies very strong correlations. LLM iterated sequencing leads to highest probability tokens in the queue. The relative token positions in the queue become highly ordered, which means they are more strongly correlated than context-free frequency counts of (English) words in a corpus. Q: Why haven’t I discussed the details of how GPT-type LLMs work? © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 17 / 21
  19. Conclusion But, there’s a twist !! Theorem 4 (Congestion is

    Correlation) Unlike conventional computer systems, where a long queue represents performance degradation (long stable waiting line), the long LLM queue is associated with strong correlations between the location of tokens in the queue. [24] Strongly correlated “congestion” is the optimum for LLMs (unlike computers). Recomputing “congested” tokens does not lower error rate at global USL minimum. Why is the ECF exponent b = 0.05 much smaller than Zipf’s law exponent b = 1? Theorem 5 (Zipf’s Law) Exponent b ≪ 1 implies very strong correlations. LLM iterated sequencing leads to highest probability tokens in the queue. The relative token positions in the queue become highly ordered, which means they are more strongly correlated than context-free frequency counts of (English) words in a corpus. Q: Why haven’t I discussed the details of how GPT-type LLMs work? A: Near the critical point, the system forgets what it is. © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 17 / 21
  20. References Outline 1 What is the ECF? 2 Computer Metastability

    3 Instantons in the Machine 4 The Final Frontier 5 Conclusion 6 References © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 18 / 21
  21. References References I [1] A. Vaswani, N. Shazeer, et al.,

    “Attention Is All You Need, ” 31st Conference on Neural Information Processing Systems (NIPS), Long Beach, CA (2017) arXiv [2] J. Kaplan, S. McCandlish, T. J. Henighan, et al., “Scaling Laws for Neural Language Models,” arXiv (2020) [3] M. Newman, “Power laws, Pareto distributions and Zipf’s law,”’ Contemporary Physics, 46(5), 323–351 (2005) [4] N. Maroney and N.J. Gunther, “Power Law Analysis of MagCloud Publications,” HP Labs Internal Technical Teport CW237032 (2011) [5] Welch Labs, “Can’t Cross This Line and We Don’t Know Why,” YouTube, September 13 (2024) [6] hampton—e/acc, “Have we discovered a fundamental law of nature for building intelligent systems?” Twitter/X, August 23 (2024) [7] E.D.Lasowska, J. Zahorjan, G.S. Graham, and K.C. Sevcik, Quantitative System Performance: Computer System Analysis Using Queueing Network Models, Engelwood Cliffs: Prentice-Hall (1984) [8] N.J. Gunther, Analyzing Computer System Performance with Perl::PDQ, 2nd Edition, Springer (2011) [9] P.J. Courtois, “Decomposability, Instabilities, and Saturation in Multiprogramming Systems”, Comm. ACM, No. 7, Vol.18, 371-377 (1975) [10] N.J. Gunther, Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services, Springer (2007) [11] N.J. Gunther, “A General Theory of Computational Scalability Based on Rational Functions,” arXiv (2008) [12] N.J. Gunther, How to Quantify Scalability: The Universal Scalability Law (USL), 27 Feb (2020) [13] N.J. Gunther, “Path Integrals for Computers,” Information Processing Letters, 32(1): 7-13 (1989) © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 19 / 21
  22. References References II [14] N.J. Gunther and J.G. Shaw, ”Path

    Integral Evaluation of ALOHA Network Transients,” Information Processing Letters, 33(6): 289-295 (1990) [15] N.J. Gunther, ”Instanton Techniques for Queueing Models of Large Computer Systems: Getting a Piece of the Action,” Invited paper presented at SIAM Conference on Applied Probability in Science and Engineering, New Orleans, Louisiana (1990) [16] N.J. Gunther, “Bilinear Model of Blocking Transients in Large Circuit-Switching Networks,” In PERFORMANCE’90, Proceedings of the 14th IFIP WG 7.3 International Symposium on Computer Performance Modelling, Measurement and Evaluation, Edinburgh, Scotland, 12-14, September, Amsterdam: North-Holland, 175-189 (1990) [17] A. Ganesh, N. O’Connell and D. Wischik, Big Queues, Lecture Notes in Mathematics, Springer-Verlag, Heidelberg (2004) [18] G. ’t Hooft, Phys. Rev. D14, 3432 (1976) [19] S. Coleman, “The Uses of Instantons,” Chap. 7 in Aspects of Symmetry, Cambridge Univ. Press (1985) [20] N.J. Gunther, D.A. Nicole and D.J. Wallace, “Goldstone Modes in Vacuum Decay and First-Order Phase Transitions,” J. Phys. A. 13, 1755-1767 (1980) [21] V. Jacobson, “Congestion Avoidance and Control,” ACM SIGCOMM Computer Communication Review, Vol. 18, No. 4, 314-329, August (1988) [22] R.M. Metcalfe, “Steady-state Analysis of the a Slotted and Controlled ALOHA System with Blocking,” In Proc. VI Hawaii Conf. on System Sciences (1973) [23] R.M. Metcalfe and D.R. Boggs, “Ethernet: Distributed Packet Switching for Local Computer Networks,” Comm. ACM. 19(7): 395 (1976) [24] J.F. Brady and N.J. Gunther, “How to Emulate Web Traffic Using Standard Load Testing Tools,” Proc. CMG Conference, La Jolla, California, arXiv (2016) © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 20 / 21
  23. References Questions? Thank you for attending www.perfdynamics.com Castro Valley, California

    Twitter twitter.com/DrQz Facebook facebook.com/PerformanceDynamics Blog perfdynamics.blogspot.com Training perfdynamics.com/Classes Email [email protected] © 2025 Performance Dynamics Research Does the Efficient Compute Frontier Represent New Physics? March 19, 2025 21 / 21