Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

The Fusion of Mathematical Optimization and AI ...

Avatar for MIKIO KUBO MIKIO KUBO
December 04, 2025

The Fusion of Mathematical Optimization and AI (MOAI): History and Outlook (Short Version)

Avatar for MIKIO KUBO

MIKIO KUBO

December 04, 2025
Tweet

More Decks by MIKIO KUBO

Other Decks in Research

Transcript

  1. Self-introduction https://www.youtube.com/@kubomikio https://x.com/MickeyKubo https://www.logopt.com/kubomikio/ Tokyo University of Marine Science and

    Technology: Professor Moai Lab : Director & CTO A-Star Quantum: Director Optimind: Advisor Ridge-i: Technical Advisor
  2. MOAI = Fusion of Three Fields Mathematical Optimization Metaheuristics Machine

    Learning Deep Learning LLMs, Gen. AI, Agentic AI MOAI Hierarchical building block AutoOpt Math-heuristics Distributionally robust optimization Encode-decode method MO hybrid for dynamic stochastic models End-to-end learning Supply chain modeling language Modeling and app generation with agentic AI (AGI4OPT) Automatic decomposition Instance similarity
  3. History of Mathematical Optimization 1947 Dantzig Simplex method 1984 Karmarker

    Interior point method 1971 Cook’s Theorem NP-completeness 1957 Bellman Dynamic programming 1958 Gomory Cutting plane 1945 Stigler’s Diet Problem First linear program 1989 Kojima Primal-dual interior point method 1954 Dantzig-Fulkerson-Johnson Traveling salesman problem 1979 Khachiyan Ellipsoid method 2008- Gurobi 1988- CPLEX 1985 AMPL A Math. Prog. Language
  4. History of Metaheutristics 1983 Kirkpatrick et al. Simulated annealing 1975

    Holland Genetic algorithm 1953 Metropolis et al. Markov chain Monte Carlo 1986 Glover Tabu search 1989-1990s Johnson et al. Experimental analsis 1977 Glover Scatter search 1970 & 73 Keinighan-Lin Variable depth local search 1995 Wolpert-Macready No free lunch theorem 2001 Kubo-Miyamoto Hierarchical building block method 2002 Nonobe-Ibaraki RCPSP solver
  5. 2012 AlexNet History of Deep Learning and Neural Net (NN)

    1943 McCulloch-Pitts Perceptron 1987 LeCun Convolutional NN 1979 Fukushima Neocognitoron 2015 ResNet 2017 Transformer 2018 GPT 1958 Resenblatt Perceptron machine 1997 Hochreiter–Schmidhuber LSTM 1985 Hinton et al. Backpropagation 2006- Fei-Fei Li ImageNet 2014 Goodfellow GAN 1982 Hopfield Hopfield machine 1972 Amari Recurrent NN 1969 Fukushima ReLU 1967 Amari Multilayer perceptoron 2021 Stable Diffusion 2022 ChatGPT 2015 Tensorflow
  6. ML -> MO (ML-first MO-second) MO ⊃ ML (ML assists

    MO, ML4MO) Mutual Engagement with Basic Optimization Theory ML MO MO -> ML (MO-first ML-second) ML MO ML ⊃ MO(MO assists ML, MO4ML) MO ML MO ML ML MO ML MO RL/MPC ML MO MODELING LLM MO:Mathematical Optimization ML: Machine Learning RL:Reinforcement Learning = O + ML MPC:Model Predictive Control = O + ML Classifications of MOAI
  7. Pattern1 ML → MO Apply MO after making predictions in

    ML The most classic and natural (predict and then optimize) approach 1. Predict with ML and function approximation = > Solve the approximate constraint or objective function with MO ✓ Embedding ML prediction models as mathematical formulas in MO (Gurobi 10.0+) ✓ Optimize ML as a black box ✓ Approximate an ML prediction model as an MO-solvable function with realistic assumptions 2. Preprocessing with ML (e.g., clustering) → MO ML MO Data Solution
  8. ML -> MO (Constraint Learning) • Gurobi ML https://github.com/Gurobi/gurobi-machinelearning •

    Linear regression • Polynomial regression (quadratic) • Logistic regression (approximating nonlinear functions with piecewise linear functions) • Neural network (fully connected layer and LeLU only) • decision tree • Gradient Boosting • Random Forest ML y x Feature vector (variable or constant) Forecast constraints target min f(x,y) s.t. x => ML => y x ∈X
  9. Decision-Focused Learning Software PyEPO https://github.com/khalil-research/PyEPO ✓ SPO+ Smart Predict-then-Optimize ✓

    DBB Differentiable Black-box Optimizer ✓ DPO Differentiable Perturbation Optimizer ✓ PYFL Pertubated Fenchel-Young Optimizer ✓ NCE Noise Contrastive Estimation ✓ LTR Learning to Rank Smart “Predict, then Optimize” A. N. Elmachtoub and P. Grigas Management Science, Volume 68
  10. Pattern 2 MO → ML • Machine learning after optimization

    • Example: Shipping Optimization with Known Past Customer Demand • Optimization is performed on historical data to find the minimum number of trucks for each day • A regression model that predicts the number of trucks based on past day information. ML MO Data Solution
  11. Pattern 3 MO ⊃ ML, ML4MO ML assists MO (objective

    is optimization) 1. Selection of solution method, setting of (hyper) parameters of solution method, selection of algorithm in ML 2. ML/RL inside the Optimization Branching rules and excision planes are improved with ML (RL) 3. Combinatorial optimization with reinforcement learning 4. Generate data (instance and solution pair) using MO and train it with ML Use ML to speed up MO for new problem example 1. ML returns solution hints and constraints to be satisfied, speeding up MO (MIPlearn) 2. ML returns an approximate (infeasible) solution and converts it to a close feasible solution (optimization proxy) 3. ML returns equality constraints and the values of an integer variables and generates a solution from it (optimization voice)
  12. Classification of ML4MO Solution ML ML returns Solution (Using MO

    to create training data) MO Instances Solution ML Parameter Algorithm ML before optimization Instances Instances Instances MO Solution ML ML inside optimization Solution RL Solving Combinatorial Optimization Problems with RL
  13. Encode-Decode Method ML Instances Decoder Solution An Encoding of Solution

    Example: Scheduling Problem Priority of jobs + mode of jobs: Decoding is an active schedule generation scheme Priority of jobs: Decoding is optimized using a scheduling solver, OptSeq Example: Delivery Planning Problem Pre-cycle (decoding is dynamic programming) Assignment to vehicles (with overlap for flexibility) + pre-cycle circuits (decoding is a recourse strategy) An algorithm to quickly recover the solution based on the encoding Encoding Decoder MIPlearn Additional Constraints MIP Solver Opt. Proxies Infeasible Sol.s Repair Layers Opt. Voice Equality Constraints Values of Integer Var.s System of Linear Equations Complexity of Encoding << Complexity of Solution
  14. Pattern 4 ML ⊃ MO, MO4ML • Perform ML tasks

    (classification, regression) using (M)O • MO model for feature selection and MO model for optimal decision tree • Application of Optimization Methods for Constrained ML Models (e.g., Lagrange Relaxation) MO Data Classification Regression ML MO assists ML (the main objective is machine learning)
  15. Open-source Packages for Optimal Decision Trees • https://github.com/LucasBoTang/Optimal_Classification_Trees Comparison of

    OCT (Optimal Classification Tree), BinaryOCT, flowOCT • MurTree https://bitbucket.org/EmirD/murtree/src/master/ Dynamic Programming • DL8.5 https://dl85.readthedocs.io/en/latest/user_guide.html Branch and Bound using Data Mining • https://github.com/pan5431333/pyoptree OCT and Local Search
  16. Pattern 5 Cross-pollination at the Foundational Level • Optimization aspect

    shares the same fundamental ✓ Optimization in DL(Nonconvex nonlinear optimization) Momentum, Adam, fit-one cycle (experimental) ✓ Nondifferential optimization theory Nesterov acceleration Theoretical convergence proofs • Mathematical optimization is used to interpret machine learning models mathematically ✓ Insight for improvements (add sparcity to models) ✓ Convergence proofs ✓ New model ideas
  17. Pattern 6 Fusion of Dynamic Models • Dynamic optimization problems

    where future information is either unavailable or uncertain • Blending the following disciplines: ✓ Dynamic Programming (DP),Approximate DP ,neuro DP ✓ Reinforcement Learning (RL) ✓ Model Predictive Control (MPC) ✓ Multi-period stochastic or robust optimization with affine recourse function adjustment inspired by linear feedback in control MO Instances Solution ML RL/MPC
  18. Comparison of DP/RL/MPC/Stochastic (Robust) / MO Hybrid DP Approx. DP

    RL MPC Stochastic (Robust) Optimization MO Hybrid Model ◯ ◯ △ ◯ ◯ ◯ Forecasting ◯ ◯ Forecast using past instances and contexts Value Function ◯ ◯ (Post-Action State) ◯ ◯ Define on states and post-action states Optimization Greedy One-period Optimization Greedy Tree Search, Beam Search, Rollout Finite Horizon (Convex Quadratic Optimization) Stochastic (Robust) Optimization (M)O on a finite horizon problem using here and now and recourse variables + ML Approximate Value Function Piecewise Linear Functions Deep Learning Pircewise-linear, Neural Net, Decision Tree Adjustable Function ◯ ◯ ◯
  19. MO Hybrid = ML+(M)O+MPC+RL (M)O Forecast Instance Generation Solution Training

    Data Period Instance 𝑡 − 1 𝑡 𝑡 + 1 𝑡 + 2 ⋯ 𝑇 𝑇 + 1 ⋯ (𝐼𝑡 <𝑡>, ሚ 𝐼𝑡+1 <𝑡>, ሚ 𝐼𝑡+2 <𝑡>, … , ሚ 𝐼𝑇 <𝑡>) (𝑋𝑡 <𝑡>, 𝑋𝑡+1 <𝑡>, 𝑋𝑡+2 <𝑡>, … , 𝑋𝑇 <𝑡>) (ሚ 𝐼𝑡+1 , ሚ 𝐼𝑡+2 , … , ሚ 𝐼𝑇 , ⋯ ) ML ML (Encode-Decode) ML (Encode-Decode) State 𝐼𝑖 (𝑖 = ⋯ , 𝑡 − 1, 𝑡) (𝑋𝑖 <𝑖>, 𝑋𝑖+1 <𝑖>, 𝑋𝑖+2 <𝑖>, ⋯ ) (𝑖 = ⋯ , 𝑡 − 1) (𝐼𝑖 <𝑖>, ሚ 𝐼𝑖+1 <𝑖>, ሚ 𝐼𝑖+2 <𝑖>, ⋯ ) Solution Encoding 𝑆𝑡−1 𝑆𝑖 ML (RL) V 𝑆𝑖 𝑆𝑇+1 𝑚𝑎𝑥 𝑣 𝑥 + 𝑉 𝑆𝑇+1 𝑋𝑡 <𝑡>, 𝑋𝑡+1 <𝑡>, 𝑋𝑡+2 <𝑡>, … , 𝑋𝑇−1 <𝑡> ≈ 𝑋𝑡 <𝑡−1>, 𝑋𝑡+1 <𝑡−1>, 𝑋𝑡+2 <𝑡−1>, … , 𝑋𝑇−1 <𝑡−1> MPC State Value Func.
  20. Horizontal AI vs Vertical AI Horizontal AI Vertical AI Health

    Care Legal Sector Financial Service Supply Chain Optimization Open AI GPT, Google Gemini, Anthorpic Claude Tempus, PathAI, Viz.ai, Aidoc, Abridge. … Harvey, EvenUp, Oronclad, Casetext, Everlaw, … Zest AI, HighRadius, ThetaRay, BioCatch, …
  21. AGI4OPT(Hearing=>Opt. App.) User Classification Agent SC Network Design Vehicle Routing

    Production Scheduling Inventory Policy … Hearing Agent AMPL Modeling +Coding Agent Code Execusion Agent https://www.moai-lab.jp/products/agi4opt >=10000 articles >= 50 books Optimization Agent Data Mapping/ Visualization Agent (Agentic Data Scientist) UI Generation Agent Data Upload Data Upload
  22. MOAI Data Platform Agentic IBP ERP Data Production Scheduling Lot

    Sizing Logistics/Service Network Design Vehicle Routing Supply Chain Risk Optimization Stage BOM Safety Stock Allocation Inventory Policy Optimization Shift Scheduling Data Mapping