The Fusion of Mathematical Optimization and AI (MOAI): History and Outlook (Short Version)

The Fusion of Mathematical Optimization and AI (MOAI): History and
Outlook Mikio Kubo

Self-introduction https://www.youtube.com/@kubomikio https://x.com/MickeyKubo https://www.logopt.com/kubomikio/ Tokyo University of Marine Science and
Technology: Professor Moai Lab : Director & CTO A-Star Quantum: Director Optimind: Advisor Ridge-i: Technical Advisor

MOAI = Fusion of Three Fields Mathematical Optimization Metaheuristics Machine
Learning Deep Learning LLMs, Gen. AI, Agentic AI MOAI Hierarchical building block AutoOpt Math-heuristics Distributionally robust optimization Encode-decode method MO hybrid for dynamic stochastic models End-to-end learning Supply chain modeling language Modeling and app generation with agentic AI (AGI4OPT) Automatic decomposition Instance similarity

History of Mathematical Optimization 1947 Dantzig Simplex method 1984 Karmarker
Interior point method 1971 Cook’s Theorem NP-completeness 1957 Bellman Dynamic programming 1958 Gomory Cutting plane 1945 Stigler’s Diet Problem First linear program 1989 Kojima Primal-dual interior point method 1954 Dantzig-Fulkerson-Johnson Traveling salesman problem 1979 Khachiyan Ellipsoid method 2008- Gurobi 1988- CPLEX 1985 AMPL A Math. Prog. Language

History of Metaheutristics 1983 Kirkpatrick et al. Simulated annealing 1975
Holland Genetic algorithm 1953 Metropolis et al. Markov chain Monte Carlo 1986 Glover Tabu search 1989-1990s Johnson et al. Experimental analsis 1977 Glover Scatter search 1970 & 73 Keinighan-Lin Variable depth local search 1995 Wolpert-Macready No free lunch theorem 2001 Kubo-Miyamoto Hierarchical building block method 2002 Nonobe-Ibaraki RCPSP solver

2012 AlexNet History of Deep Learning and Neural Net (NN)
1943 McCulloch-Pitts Perceptron 1987 LeCun Convolutional NN 1979 Fukushima Neocognitoron 2015 ResNet 2017 Transformer 2018 GPT 1958 Resenblatt Perceptron machine 1997 Hochreiter–Schmidhuber LSTM 1985 Hinton et al. Backpropagation 2006- Fei-Fei Li ImageNet 2014 Goodfellow GAN 1982 Hopfield Hopfield machine 1972 Amari Recurrent NN 1969 Fukushima ReLU 1967 Amari Multilayer perceptoron 2021 Stable Diffusion 2022 ChatGPT 2015 Tensorflow

ML -> MO (ML-first MO-second) MO ⊃ ML (ML assists
MO, ML4MO） Mutual Engagement with Basic Optimization Theory ML MO MO -> ML (MO-first ML-second) ML MO ML ⊃ MO（MO assists ML, MO4ML） MO ML MO ML ML MO ML MO RL/MPC ML MO MODELING LLM MO：Mathematical Optimization ML： Machine Learning RL：Reinforcement Learning = O + ML MPC：Model Predictive Control = O + ML Classifications of MOAI

Pattern１ ML → MO Apply MO after making predictions in
ML The most classic and natural (predict and then optimize) approach 1. Predict with ML and function approximation = > Solve the approximate constraint or objective function with MO ✓ Embedding ML prediction models as mathematical formulas in MO (Gurobi 10.0+) ✓ Optimize ML as a black box ✓ Approximate an ML prediction model as an MO-solvable function with realistic assumptions 2. Preprocessing with ML (e.g., clustering) → MO ML MO Data Solution

ML -> MO (Constraint Learning) • Gurobi ML https://github.com/Gurobi/gurobi-machinelearning •
Linear regression • Polynomial regression (quadratic) • Logistic regression (approximating nonlinear functions with piecewise linear functions) • Neural network (fully connected layer and LeLU only) • decision tree • Gradient Boosting • Random Forest ML y x Feature vector (variable or constant) Forecast constraints target min f(x,y) s.t. x => ML => y x ∈X

Decision-Focused Learning Software PyEPO https://github.com/khalil-research/PyEPO ✓ SPO+ Smart Predict-then-Optimize ✓
DBB Differentiable Black-box Optimizer ✓ DPO Differentiable Perturbation Optimizer ✓ PYFL Pertubated Fenchel-Young Optimizer ✓ NCE Noise Contrastive Estimation ✓ LTR Learning to Rank Smart “Predict, then Optimize” A. N. Elmachtoub and P. Grigas Management Science, Volume 68

Pattern 2 MO → ML • Machine learning after optimization
• Example: Shipping Optimization with Known Past Customer Demand • Optimization is performed on historical data to find the minimum number of trucks for each day • A regression model that predicts the number of trucks based on past day information. ML MO Data Solution

Pattern 3 MO ⊃ ML, ML4MO ML assists MO (objective
is optimization) 1. Selection of solution method, setting of (hyper) parameters of solution method, selection of algorithm in ML 2. ML/RL inside the Optimization Branching rules and excision planes are improved with ML (RL) 3. Combinatorial optimization with reinforcement learning 4. Generate data (instance and solution pair) using MO and train it with ML Use ML to speed up MO for new problem example 1. ML returns solution hints and constraints to be satisfied, speeding up MO (MIPlearn) 2. ML returns an approximate (infeasible) solution and converts it to a close feasible solution (optimization proxy) 3. ML returns equality constraints and the values of an integer variables and generates a solution from it (optimization voice)

Classification of ML4MO Solution ML ML returns Solution (Using MO
to create training data) MO Instances Solution ML Parameter Algorithm ML before optimization Instances Instances Instances MO Solution ML ML inside optimization Solution RL Solving Combinatorial Optimization Problems with RL

Encode-Decode Method ML Instances Decoder Solution An Encoding of Solution
Example: Scheduling Problem Priority of jobs + mode of jobs: Decoding is an active schedule generation scheme Priority of jobs: Decoding is optimized using a scheduling solver, OptSeq Example: Delivery Planning Problem Pre-cycle (decoding is dynamic programming) Assignment to vehicles (with overlap for flexibility) + pre-cycle circuits (decoding is a recourse strategy) An algorithm to quickly recover the solution based on the encoding Encoding Decoder MIPlearn Additional Constraints MIP Solver Opt. Proxies Infeasible Sol.s Repair Layers Opt. Voice Equality Constraints Values of Integer Var.s System of Linear Equations Complexity of Encoding << Complexity of Solution

Pattern 4 ML ⊃ MO, MO4ML • Perform ML tasks
(classification, regression) using (M)O • MO model for feature selection and MO model for optimal decision tree • Application of Optimization Methods for Constrained ML Models (e.g., Lagrange Relaxation) MO Data Classification Regression ML MO assists ML (the main objective is machine learning)

Open-source Packages for Optimal Decision Trees • https://github.com/LucasBoTang/Optimal_Classification_Trees Comparison of
OCT (Optimal Classification Tree), BinaryOCT, flowOCT • MurTree https://bitbucket.org/EmirD/murtree/src/master/ Dynamic Programming • DL8.5 https://dl85.readthedocs.io/en/latest/user_guide.html Branch and Bound using Data Mining • https://github.com/pan5431333/pyoptree OCT and Local Search

Pattern 5 Cross-pollination at the Foundational Level • Optimization aspect
shares the same fundamental ✓ Optimization in DL（Nonconvex nonlinear optimization) Momentum, Adam, fit-one cycle (experimental) ✓ Nondifferential optimization theory Nesterov acceleration Theoretical convergence proofs • Mathematical optimization is used to interpret machine learning models mathematically ✓ Insight for improvements (add sparcity to models) ✓ Convergence proofs ✓ New model ideas

Pattern 6 Fusion of Dynamic Models • Dynamic optimization problems
where future information is either unavailable or uncertain • Blending the following disciplines: ✓ Dynamic Programming (DP)，Approximate DP ，neuro DP ✓ Reinforcement Learning (RL) ✓ Model Predictive Control (MPC) ✓ Multi-period stochastic or robust optimization with affine recourse function adjustment inspired by linear feedback in control MO Instances Solution ML RL/MPC

Comparison of DP/RL/MPC/Stochastic (Robust) / MO Hybrid DP Approx. DP
RL MPC Stochastic (Robust) Optimization MO Hybrid Model ◯ ◯ △ ◯ ◯ ◯ Forecasting ◯ ◯ Forecast using past instances and contexts Value Function ◯ ◯ (Post-Action State) ◯ ◯ Define on states and post-action states Optimization Greedy One-period Optimization Greedy Tree Search， Beam Search, Rollout Finite Horizon (Convex Quadratic Optimization) Stochastic (Robust) Optimization (M)O on a finite horizon problem using here and now and recourse variables + ML Approximate Value Function Piecewise Linear Functions Deep Learning Pircewise-linear, Neural Net, Decision Tree Adjustable Function ◯ ◯ ◯

MO Hybrid = ML+(M)O+MPC+RL (M)O Forecast Instance Generation Solution Training
Data Period Instance 𝑡 − 1 𝑡 𝑡 + 1 𝑡 + 2 ⋯ 𝑇 𝑇 + 1 ⋯ (𝐼𝑡 <𝑡>, ሚ 𝐼𝑡+1 <𝑡>, ሚ 𝐼𝑡+2 <𝑡>, … , ሚ 𝐼𝑇 <𝑡>) (𝑋𝑡 <𝑡>, 𝑋𝑡+1 <𝑡>, 𝑋𝑡+2 <𝑡>, … , 𝑋𝑇 <𝑡>) (ሚ 𝐼𝑡+1 , ሚ 𝐼𝑡+2 , … , ሚ 𝐼𝑇 , ⋯ ) ML ML (Encode-Decode) ML (Encode-Decode) State 𝐼𝑖 (𝑖 = ⋯ , 𝑡 − 1, 𝑡) (𝑋𝑖 <𝑖>, 𝑋𝑖+1 <𝑖>, 𝑋𝑖+2 <𝑖>, ⋯ ) (𝑖 = ⋯ , 𝑡 − 1) (𝐼𝑖 <𝑖>, ሚ 𝐼𝑖+1 <𝑖>, ሚ 𝐼𝑖+2 <𝑖>, ⋯ ) Solution Encoding 𝑆𝑡−1 𝑆𝑖 ML (RL) V 𝑆𝑖 𝑆𝑇+1 𝑚𝑎𝑥 𝑣 𝑥 + 𝑉 𝑆𝑇+1 𝑋𝑡 <𝑡>, 𝑋𝑡+1 <𝑡>, 𝑋𝑡+2 <𝑡>, … , 𝑋𝑇−1 <𝑡> ≈ 𝑋𝑡 <𝑡−1>, 𝑋𝑡+1 <𝑡−1>, 𝑋𝑡+2 <𝑡−1>, … , 𝑋𝑇−1 <𝑡−1> MPC State Value Func.

Horizontal AI vs Vertical AI Horizontal AI Vertical AI Health
Care Legal Sector Financial Service Supply Chain Optimization Open AI GPT, Google Gemini, Anthorpic Claude Tempus, PathAI, Viz.ai, Aidoc, Abridge. … Harvey, EvenUp, Oronclad, Casetext, Everlaw, … Zest AI, HighRadius, ThetaRay, BioCatch, …

AGI4OPT（Hearing=>Opt. App.） User Classification Agent SC Network Design Vehicle Routing
Production Scheduling Inventory Policy … Hearing Agent AMPL Modeling +Coding Agent Code Execusion Agent https://www.moai-lab.jp/products/agi4opt >=10000 articles >= 50 books Optimization Agent Data Mapping/ Visualization Agent (Agentic Data Scientist) UI Generation Agent Data Upload Data Upload

MOAI Data Platform Agentic IBP ERP Data Production Scheduling Lot
Sizing Logistics/Service Network Design Vehicle Routing Supply Chain Risk Optimization Stage BOM Safety Stock Allocation Inventory Policy Optimization Shift Scheduling Data Mapping

The Fusion of Mathematical Optimization and AI ...

The Fusion of Mathematical Optimization and AI (MOAI): History and Outlook (Short Version)

MIKIO KUBO

More Decks by MIKIO KUBO

Other Decks in Research

Featured

Transcript

The Fusion of Mathematical Optimization and AI (MOAI): History and

Self-introduction https://www.youtube.com/@kubomikio https://x.com/MickeyKubo https://www.logopt.com/kubomikio/ Tokyo University of Marine Science and

MOAI = Fusion of Three Fields Mathematical Optimization Metaheuristics Machine

History of Mathematical Optimization 1947 Dantzig Simplex method 1984 Karmarker

History of Metaheutristics 1983 Kirkpatrick et al. Simulated annealing 1975

2012 AlexNet History of Deep Learning and Neural Net (NN)

ML -> MO (ML-first MO-second) MO ⊃ ML (ML assists

Pattern１ ML → MO Apply MO after making predictions in

ML -> MO (Constraint Learning) • Gurobi ML https://github.com/Gurobi/gurobi-machinelearning •

Decision-Focused Learning Software PyEPO https://github.com/khalil-research/PyEPO ✓ SPO+ Smart Predict-then-Optimize ✓

Pattern 2 MO → ML • Machine learning after optimization

Pattern 3 MO ⊃ ML, ML4MO ML assists MO (objective

Classification of ML4MO Solution ML ML returns Solution (Using MO

Encode-Decode Method ML Instances Decoder Solution An Encoding of Solution

Pattern 4 ML ⊃ MO, MO4ML • Perform ML tasks

Open-source Packages for Optimal Decision Trees • https://github.com/LucasBoTang/Optimal_Classification_Trees Comparison of

Pattern 5 Cross-pollination at the Foundational Level • Optimization aspect

Pattern 6 Fusion of Dynamic Models • Dynamic optimization problems

Comparison of DP/RL/MPC/Stochastic (Robust) / MO Hybrid DP Approx. DP

MO Hybrid = ML+(M)O+MPC+RL (M)O Forecast Instance Generation Solution Training

Horizontal AI vs Vertical AI Horizontal AI Vertical AI Health

AGI4OPT（Hearing=>Opt. App.） User Classification Agent SC Network Design Vehicle Routing

MOAI Data Platform Agentic IBP ERP Data Production Scheduling Lot