& Jagadees Rathinavel Department of Applied Mathematics Center for Interdisciplinary Scientific Computation Illinois Institute of Technology [email protected] mypages.iit.edu/~hickernell Thanks to the organizers, the GAIL team, NSF-DMS-1522687 and NSF-DMS-1638521 (SAMSI) SAMSI-Lloyds-Turing Workshop on Probabilistic Numerical Methods, April 11, 2018
Do We Stop? Compute the solution to a linear problem: sol(f) = $ ’ ’ ’ & ’ ’ ’ % ż Rd f(x) (x) dx Bayesian inference, financial risk, statistical physics, ... f(x) surrogate modeling for computer experiments, ... . . . Desired solution: An adaptive algorithm, app(¨, ¨) of the form app(f, ε) = w0,n + n ÿ i=1 wi,n f(xi ), where n, txiu∞ i=1 , w0,n , and w = (wi,n )n i=1 are chosen to guarantee sol(f) ´ app(f, ε) ď ε with high probability @ε ą 0, reasonable f 2/12
Probabilistic Numerics Approach Assume f „ GP(m, s2Cθ), a sample from a Gaussian process. Defining c = sol¨(sol¨¨(Cθ(¨, ¨¨))), c = sol¨(Cθ(¨, x1 )), . . . , sol¨(Cθ(¨, xn )) T , C = Cθ(xi , xj ) n i,j=1 and choosing the weights as w0 = m[sol(1) ´ cTC´11], w = C´1c, app(f, ε) = w0 + wTf, f = f(xi ) n i=1 . yields an unbiased approximation: sol(f) ´ app(f, ε) ˇ ˇ f = y „ N 0, s2(c ´ cTC´1c) If n is chosen large enough to make 2.58sac ´ cTC´1c ď ε, then we are assured that Pf [|sol(f) ´ app(f, ε)| ď ε] ě 99%. 3/12
Probabilistic Numerics Approach Assume f „ GP(m, s2Cθ), a sample from a Gaussian process. Defining c = sol¨(sol¨¨(Cθ(¨, ¨¨))), c = sol¨(Cθ(¨, x1 )), . . . , sol¨(Cθ(¨, xn )) T , C = Cθ(xi , xj ) n i,j=1 and choosing the weights as w0 = m[sol(1) ´ cTC´11], w = C´1c, app(f, ε) = w0 + wTf, f = f(xi ) n i=1 . yields an unbiased approximation: sol(f) ´ app(f, ε) ˇ ˇ f = y „ N 0, s2(c ´ cTC´1c) If n is chosen large enough to make 2.58sac ´ cTC´1c ď ε, then we are assured that Pf [|sol(f) ´ app(f, ε)| ď ε] ě 99%. There are issues requiring attention. 3/12
Likelihood Estimation Minimize minus twice the log likelihood observed with f = y: 2n log(s) + log(det(C)) + s´2(y ´ m1)TC´1(y ´ m1) first with respect to m, then s, then θ: mMLE = 1TC´1y 1TC´11 , s2 MLE = 1 n yT C´1 ´ C´111TC´1 1TC´11 y θMLE = argmin θ " n log yT C´1 ´ C´111TC´1 1TC´11 y + log(det(C)) * 4/12
Likelihood Estimation Minimize minus twice the log likelihood observed with f = y: 2n log(s) + log(det(C)) + s´2(y ´ m1)TC´1(y ´ m1) first with respect to m, then s, then θ: mMLE = 1TC´1y 1TC´11 , s2 MLE = 1 n yT C´1 ´ C´111TC´1 1TC´11 y θMLE = argmin θ " n log yT C´1 ´ C´111TC´1 1TC´11 y + log(det(C)) * Stopping criterion becomes 2.58 g f f f f e c ´ cTC´1c n looooooomooooooon depends on design yT C´1 ´ C´111TC´1 1TC´11 y looooooooooooooomooooooooooooooon depends on data ď ε, 4/12
Likelihood Estimation mMLE = 1TC´1y 1TC´11 , s2 MLE = 1 n yT C´1 ´ C´111TC´1 1TC´11 y θMLE = argmin θ " n log yT C´1 ´ C´111TC´1 1TC´11 y + log(det(C)) * Stopping criterion becomes 2.58 g f f f f e c ´ cTC´1c n looooooomooooooon depends on design yT C´1 ´ C´111TC´1 1TC´11 y looooooooooooooomooooooooooooooon depends on data ď ε, Q1: How large a family of kernels, Cθ is needed in practice to be confident in the error bound? Q2: Are the better ways of finding the right θ, say cross-validation? Q3: Can we check out assumption that f really comes from a Gaussian process? If not, are there alternatives 4/12
Discrepancy Sampling Suppose that the domain is [0, 1]d. Low discrepancy sampling places the xi more evenly than IID sampling IID points Sobol’ points Integration lattice points ¨¨¨ Dick, J. et al. High dimensional integration — the Quasi-Monte Carlo way. Acta Numer. 22, 133–288 (2013), H., F. J. et al. SAMSI Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics. https://www.samsi.info/programs-and-activities/year-long-research-programs/2017-18-program-quasi- monte-carlo-high-dimensional-sampling-methods-applied-mathematics-qmc/. 5/12
Kernels that Match the Design Suppose that the covariance kernel, Cθ , and the design, txiun i=1 , have special properties: C = Cθ(xi , xj ) n i,j=1 = C1 , . . . , Cn = 1 n VΛVH, VH = nV´1, Λ = diag(λ1 , . . . , λn ) = diag(λ) V = V1 ¨ ¨ ¨ Vn = v1 ¨ ¨ ¨ vn T , V1 = v1 = 1 Suppose that VTz is a fast transform (O(n log n) cost) applied to z. Then it follows that λ = VTC1 is fast, C´11 = 1 λ1 Let y be the observed function values. Recall c = sol¨(sol¨¨(Cθ(¨, ¨¨))), and let ^ y = VTy, p c = VTc, c = sol¨(Cθ(¨, x1 )), . . . , sol¨(Cθ(¨, xn )) T Then using the MLE estimates, the approximate solution and the stopping criterion become: app(f, ε) = ^ y1 sol(1) n + n ÿ i=2 ^ y˚ i p ci λi , 2.58 g f f e c ´ 1 n n ÿ i=1 | p ci |2 λi 1 n2 n ÿ i=2 | p yi |2 λi ď ε 6/12
Kernels that Match the Design c = sol¨(sol¨¨(Cθ(¨, ¨¨))), C = Cθ(xi , xj ) n i,j=1 = 1 n VΛVH, Λ = diag(λ), V1 = v1 = 1 VTz is O(n log n), λ = VTC1 , C´11 = 1 λ1 ^ y = VTy, p c = VTc, c = sol¨(Cθ(¨, x1 )), . . . , sol¨(Cθ(¨, xn )) T app(f, ε) = ^ y1 sol(1) n + n ÿ i=2 ^ y˚ i p ci λi , 2.58 g f f e c ´ 1 n n ÿ i=1 | p ci |2 λi 1 n2 n ÿ i=2 | p yi |2 λi ď ε θMLE = argmin θ # n log n ÿ i=2 | p yi |2 λi + n ÿ i=1 log(λi ) + For integration with respect to a density and our special kernels, sol(1) = 1, c = 1, and c = 1: app(f, ε) = ^ y1 n , 2.58 g f f e 1 ´ n λ1 1 n2 n ÿ i=2 | p yi |2 λi ď ε 7/12
Kernels that Match the Design c = sol¨(sol¨¨(Cθ(¨, ¨¨))), C = Cθ(xi , xj ) n i,j=1 = 1 n VΛVH, Λ = diag(λ), V1 = v1 = 1 VTz is O(n log n), λ = VTC1 , C´11 = 1 λ1 ^ y = VTy, p c = VTc, c = sol¨(Cθ(¨, x1 )), . . . , sol¨(Cθ(¨, xn )) T app(f, ε) = ^ y1 sol(1) n + n ÿ i=2 ^ y˚ i p ci λi , 2.58 g f f e c ´ 1 n n ÿ i=1 | p ci |2 λi 1 n2 n ÿ i=2 | p yi |2 λi ď ε θMLE = argmin θ # n log n ÿ i=2 | p yi |2 λi + n ÿ i=1 log(λi ) + For integration with respect to a density and our special kernels, sol(1) = 1, c = 1, and c = 1: app(f, ε) = ^ y1 n , 2.58 g f f e 1 ´ n λ1 1 n2 n ÿ i=2 | p yi |2 λi ď ε Q4: How do we avoid round-off in evaluating 1 ´ n/λ1 ? 7/12
of Matching Covariance Kernels Typically the domain of f is [0, 1)d, and C(x, t) = # r C(x ´ t mod 1) integration lattices r C(x ‘ t) Sobol’ sequences, ‘ means digitwise addition modulo 2 E.g., for integration lattices C(x, t) = d ź k=1 [1 ´ θ1 (´1)θ2 B2θ2 (xk ´ tk mod 1)], θ1 ą 0, θ2 P N 8/12
of Matching Covariance Kernels Typically the domain of f is [0, 1)d, and C(x, t) = # r C(x ´ t mod 1) integration lattices r C(x ‘ t) Sobol’ sequences, ‘ means digitwise addition modulo 2 E.g., for integration lattices C(x, t) = d ź k=1 [1 ´ θ1 (´1)θ2 B2θ2 (xk ´ tk mod 1)], θ1 ą 0, θ2 P N Q5: How do we periodize f to take advantage of integration lattices and smoother covariance kernels? Does this even work for function approximation? Q6: May we get higher order convergence with higher order nets and smoother kernels? What do those kernels look like? Q7: Are low discrepancy designs also good for function approximation? Q8: Are there other kernel/design combinations that expedite vector-matrix operations? 8/12
of Questions Q1: How large a family of kernels, Cθ is needed in practice to be confident in the error bound? Q2: Are the better ways of finding the right θ, say cross-validation? Q3: Can we check out assumption that f really comes from a Gaussian process? If not, are there alternatives Q4: How do we avoid round-off in evaluating 1 ´ n/λ1 ? Q5: How do we periodize f to take advantage of integration lattices and smoother covariance kernels? Does this even work for function approximation? Q6: May we get higher order convergence with higher order nets and smoother kernels? What do those kernels look like? Q7: Are low discrepancy designs also good for function approximation? Q8: Are there other kernel/design combinations that expedite vector-matrix operations? Q9: Is this adaptive Bayesian algorithm competitive with others? Q10: What other problems are amenable to matched kernels and designs beyond cubature and function approximation? 11/12
J., Kuo, F. & Sloan, I. H. High dimensional integration — the Quasi-Monte Carlo way. Acta Numer. 22, 133–288 (2013). H., F. J., Kuo, F. Y., L’Ecuyer, P. & Owen, A. B. SAMSI Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applied Mathematics. https://www.samsi.info/programs-and-activities/year-long-research- programs/2017-18-program-quasi-monte-carlo-high-dimensional-sampling-methods- applied-mathematics-qmc/. 12/12