Hickernell Department of Applied Mathematics Center for Interdisciplinary Scientific Computation Illinois Institute of Technology [email protected] mypages.iit.edu/~hickernell Joint work with Yuhan Ding, Peter Kritzer, and Simon Mak This work partially supported by NSF-DMS-1522687 and NSF-DMS-1638521 (SAMSI) Happy Birthday to my brother Bob, Chief of the Quantum Electromagnetics Division at NIST Thank you for the the kind invitation Los Alamos National Laboratory, July 31, 2019
ALG such that given a black box providing information about f : Ω ⊂ Rd → R f − ALG(f, ε) G ε ∀ε > 0, f ∈ H ⊆ F (Banach space) Impossible for infinite dimensional Banach space H = F Smoothness assumed by F speeds up ALG Smoothness alone cannot save from the curse of dimensionality, but a low effective-dimension structure can Choosing H to be a cone , rather than a ball , paves the way for adaptive algorithms Interesting design (where to sample) problems remain 2/18
box providing noiseless information about f : Ω ⊆ Rd → R e.g., function values or series coefficients, costing $(f) each Error tolerance ε Output ALG(f, ε) (as a surrogate, for solving PDEs, for uncertainty quantification) that is Cheap to evaluate and manipulate Accurate f − ALG(f, ε) G ε ∀ε > 0 Efficient to construct 3/18
box providing noiseless information about f : Ω ⊆ Rd → R e.g., function values or series coefficients, costing $(f) each Error tolerance ε Output ALG(f, ε) that is Cheap to evaluate and manipulate Accurate f − ALG(f, ε) G ε ∀ε > 0 Efficient to construct Approximation with fixed computation budget: APP(f, n) = n i=1 Li (f)gi,n L1 (f), L2 (f), . . . is input function information, e.g., function values or series coefficients gn = (g1,n , . . . , gn,n ) ∈ Gn COST(f, n) = O(n$(f) + COST(gn )) Algorithm ALG(f, ε) = APP(f, n∗(f, ε)) satisfying f − APP(f, n∗(f, ε)) G ε ∀ε > 0 COST(f, ε) = COST(f, n∗(f, ε)) + cost to determine n∗(f, ε) 3/18
box providing noiseless information about f : Ω ⊆ Rd → R costing $(f) each f ∈ F, definition of · F enshrines smoothness assumptions Error tolerance ε Output ALG(f, ε) that is Cheap to evaluate and manipulate Accurate f − ALG(f, ε) G ε ∀ε > 0, f ∈ H ⊂ F, provably Efficient to construct Approximation with fixed computation budget: APP(f, n) = n i=1 Li (f)gi,n L1 (f), L2 (f), . . . is input function information gn = (g1,n , . . . , gn,n ) ∈ Gn COST(f, n) = O(n$(f) + COST(gn )) Algorithm ALG(f, ε) = APP(f, n∗(f, ε)) satisfying f − APP(f, n∗(f, ε)) G ε ∀ε > 0, f ∈ H ⊂ F COST(f, ε) = COST(f, n∗(f, ε)) + cost to determine n∗(f, ε) 3/18
f in Infinite Dimensional F f − ALG(f, ε) G ε ∀f ∈ H ⊂ F Proof by contradiction Suppose H = F Fix ε > 0 Let L1 , . . . , Ln be the linear information used to construct ALG(0, ε) Choose nonzero fooling function f ∈ F, such that L1 (f) = · · · = Ln (f) = 0 ALG(±cf, ε) = ALG(0, ε) for all c > 0 For all c > 0 ε max cf − ALG(cf, ε) G , −cf − ALG(−cf, ε) G 1 2 cf − ALG(cf, ε) G + −cf − ALG(−cf, ε) G 1 2 cf − ALG(0, ε) G + cf + ALG(0, ε) G c f G =⇒⇐= 4/18
Less Expensive For d = 1, let {u0 , u1 , . . .} be an orthogonal (polynomial) basis for F and G F := f = ∞ k=0 f(k)uk : f F := f(k) λk ∞ k=0 2 < ∞ , λ0 λ1 · · · > 0 G := g = ∞ k=0 ^ g(k)uk : g G := ^ g(k) ∞ k=0 2 < ∞ , APP(f, n) = n−1 k=0 f(k)uk 5/18
Less Expensive For d = 1, let {u0 , u1 , . . .} be an orthogonal (polynomial) basis for F and G F := f = ∞ k=0 f(k)uk : f F := f(k) λk ∞ k=0 2 < ∞ , λ0 λ1 · · · > 0 G := g = ∞ k=0 ^ g(k)uk : g G := ^ g(k) ∞ k=0 2 < ∞ , APP(f, n) = n−1 k=0 f(k)uk f − APP(f, n) G = f(k) ∞ k=n 2 = λk f(k) λk ∞ k=n 2 tight f F λn ? ε, require λn ↓ 0 7/18
Less Expensive For d = 1, let {u0 , u1 , . . .} be an orthogonal (polynomial) basis for F and G F := f = ∞ k=0 f(k)uk : f F := f(k) λk ∞ k=0 2 < ∞ , λ0 λ1 · · · > 0 G := g = ∞ k=0 ^ g(k)uk : g G := ^ g(k) ∞ k=0 2 < ∞ , APP(f, n) = n−1 k=0 f(k)uk f − APP(f, n) G = f(k) ∞ k=n 2 = λk f(k) λk ∞ k=n 2 tight f F λn ? ε, require λn ↓ 0 By choosing H = BR := {f ∈ F : f F R}, we can define our algorithm ALG(f, ε) = APP(f, n∗) & n∗ = min{n : λn ε/R} =⇒ f − ALG(f, ε) G ε ∀f ∈ BR λn = O(n−1/p) =⇒ COST(BR , ε) = O(Rpε−p) 7/18
Less Expensive For d = 1, let {u0 , u1 , . . .} be an orthogonal (polynomial) basis for F and G F := f = ∞ k=0 f(k)uk : f F := f(k) λk ∞ k=0 2 < ∞ , λ0 λ1 · · · > 0 G := g = ∞ k=0 ^ g(k)uk : g G := ^ g(k) ∞ k=0 2 < ∞ , APP(f, n) = n−1 k=0 f(k)uk f − APP(f, n) G = f(k) ∞ k=n 2 = λk f(k) λk ∞ k=n 2 tight f F λn ? ε, require λn ↓ 0 By choosing H = BR := {f ∈ F : f F R}, we can define our algorithm ALG(f, ε) = APP(f, n∗) & n∗ = min{n : λn ε/R} =⇒ f − ALG(f, ε) G ε ∀f ∈ BR λn = O(n−1/p) =⇒ COST(BR , ε) = O(Rpε−p) ALG has optimal cost among all successful algorithms using Fourier coefficients (look at the cost of approximating the zero function) 7/18
Less Expensive For d = 1, let {u0 , u1 , . . .} be an orthogonal (polynomial) basis for F and G F := f = ∞ k=0 f(k)uk : f F := f(k) λk ∞ k=0 2 < ∞ , λ0 λ1 · · · > 0 G := g = ∞ k=0 ^ g(k)uk : g G := ^ g(k) ∞ k=0 2 < ∞ , APP(f, n) = n−1 k=0 f(k)uk f − APP(f, n) G = f(k) ∞ k=n 2 = λk f(k) λk ∞ k=n 2 tight f F λn ? ε, require λn ↓ 0 By choosing H = BR := {f ∈ F : f F R}, we can define our algorithm ALG(f, ε) = APP(f, n∗) & n∗ = min{n : λn ε/R} =⇒ f − ALG(f, ε) G ε ∀f ∈ BR λn = O(n−1/p) =⇒ COST(BR , ε) = O(Rpε−p) ALG has optimal cost among all successful algorithms using Fourier coefficients (look at the cost of approximating the zero function) Similar results for algorithms based on function values, but need to choose the design carefully 7/18
You from the Curse of Dimensionality1 For arbitrary d, let {u0 = 1, u1 } be used to construct a product basis F and G (multlinear functions) F := f(x) = k∈{0,1}d f(k)uk : f F := f(k) λk k∈{0,1}d 2 < ∞ , uk(x) := d =1 uk (x ) G := g = k∈{0,1}d ^ g(k)uk : g G := ^ g(k) k∈{0,1}d 2 < ∞ , λk := d =1 k =0 s = s k 0 APP(f, n) = n i=1 f(ki )uki , λk1 = 1 s = λk2 · · · sd, 1NovWoz08a. 8/18
You from the Curse of Dimensionality1 For arbitrary d, let {u0 = 1, u1 } be used to construct a product basis F and G (multlinear functions) F := f(x) = k∈{0,1}d f(k)uk : f F := f(k) λk k∈{0,1}d 2 < ∞ , uk(x) := d =1 uk (x ) G := g = k∈{0,1}d ^ g(k)uk : g G := ^ g(k) k∈{0,1}d 2 < ∞ , λk := d =1 k =0 s = s k 0 APP(f, n) = n i=1 f(ki )uki , λk1 = 1 s = λk2 · · · sd, ALG(f, ε) = APP(f, n∗) & n∗ = min{n : λkn+1 ε/R} =⇒ f − ALG(f, ε) G ε ∀f ∈ BR λkn = O n−1/pespd/p =⇒ COST(BR , ε) = O Rpε−pespd ∀p exponential growth in d 1NovWoz08a. 10/18
= O n−1/pespd/p ∀p > 0 λp kn+1 1 n λp k1 + · · · + λp kn λki are ordered λkn+1 1 n1/p λp k1 + · · · + λp kn 1/p pth root 1 n1/p λp k1 + · · · + λp k 2d 1/p add the rest in 1 n1/p 1 + sp d/p binomial theorem espd/p n1/p 1 + x ex for x 0 There is a similar proof that provides a lower bound on λkn+1 11/18
Save You1 For arbitrary d, let {u0 = 1, u1 } be used to construct a product basis F and G (multlinear functions) F := f(x) = k∈{0,1}d f(k)uk : f F := f(k) λk k∈{0,1}d 2 < ∞ , uk(x) := d =1 uk (x ) G := g = k∈{0,1}d ^ g(k)uk : g G := ^ g(k) k∈{0,1}d 2 < ∞ , λk := d =1 k =0 w s APP(f, n) = n i=1 f(ki )uki , λk1 = 1 w1 s = λk2 · · · , 1 = w1 w2 · · · ALG(f, ε) = APP(f, n∗) & n∗ = min{n : λkn+1 ε/R} =⇒ f − ALG(f, ε) G ε ∀f ∈ BR λkn = O n−1/p exp p−1sp d =1 wp =⇒ COST(BR , ε) = O Rpε−p exp sp d =1 wp ∀p cost is independent of d if coordinate weights decay quickly 1NovWoz08a. 12/18
Save You, Even with Higher Order Polynomials1 For arbitrary d, let {u0 = 1, u1 , . . .} be used to construct a product basis F and G F := f(x) = k∈Nd 0 f(k)uk : f F := f(k) λk k∈Nd 0 2 < ∞ , uk(x) := d =1 uk (x ) G := g = k∈Nd 0 ^ g(k)uk : g G := ^ g(k) k∈Nd 0 2 < ∞ , λk := d =1 k =0 w sk APP(f, n) = n i=1 f(ki )uki , λk1 = 1 λk2 · · · , 1 = w1 w2 · · · ALG(f, ε) = APP(f, n∗) & n∗ = min{n : λkn+1 ε/R} =⇒ f − ALG(f, ε) G ε ∀f ∈ BR λkn = O n−1/p exp p−1 ∞ k=1 sp k d =1 wp =⇒ COST(BR , ε) = O Rpε−p exp ∞ k=1 sp k d =1 wp ∀p cost is independent of d if coordinate and smoothness weights decay quickly 1NovWoz08a. 12/18
for Adaptive Algorithms Goal: Construct ALG such that given a black box providing information about f : Ω ⊂ Rd → R f − ALG(f, ε) G ε ∀ε > 0, f ∈ H ⊆ F (Banach space) So far, H = BR Hard to know a priori how large R should be for your problem Computational cost depends on R and ε, but not on f data Choosing H = makes adaptive algorithms possible2 2HicEtal17a, KunEtal19a, DinHic20a, RatHic19a. 13/18
Cone of Inputs Based on Pilot Sample3 F := f = ∞ i=1 f(ki )uki : f F := f(ki ) λki ∞ i=1 2 λk1 λk2 · · · > 0 λ affects convergence rate & tractability G := g = ∞ i=1 ^ g(ki )uki : g G := ^ g 2 , APP(f, n) = n i=1 f(ki )uki 3DinEtal20a. 14/18
Cone of Inputs Based on Pilot Sample3 F := f = ∞ i=1 f(ki )uki : f F := f(ki ) λki ∞ i=1 2 λk1 λk2 · · · > 0 λ affects convergence rate & tractability G := g = ∞ i=1 ^ g(ki )uki : g G := ^ g 2 , APP(f, n) = n i=1 f(ki )uki Cd,λ,n1,A := f ∈ F : f F A f(ki ) λki n1 i=1 2 pilot sample bounds the norm of the input A is inflation factor, n1 is initial sample size f − APP(f, n) G A2 f(ki ) λki n1 i=1 2 2 − f(ki ) λki n i=1 2 2 1/2 upper bound on f− n i=1 f(ki)uki F λkn+1 =: ERR f(ki ) n i=1 , n data-driven 3DinEtal20a. 14/18
Cone of Inputs Based on Pilot Sample3 F := f = ∞ i=1 f(ki )uki : f F := f(ki ) λki ∞ i=1 2 λk1 λk2 · · · > 0 λ affects convergence rate & tractability G := g = ∞ i=1 ^ g(ki )uki : g G := ^ g 2 , APP(f, n) = n i=1 f(ki )uki Cd,λ,n1,A := f ∈ F : f F A f(ki ) λki n1 i=1 2 pilot sample bounds the norm of the input A is inflation factor, n1 is initial sample size f − APP(f, n) G A2 f(ki ) λki n1 i=1 2 2 − f(ki ) λki n i=1 2 2 1/2 upper bound on f− n i=1 f(ki)uki F λkn+1 =: ERR f(ki ) n i=1 , n data-driven ALG(f, ε) = APP(f, n∗(f, ε)) for n∗(f, ε) = min{n ∈ N : ERR f(ki ) n i=1 , n ε} 3DinEtal20a. 14/18
Cone of Inputs Based on Pilot Sample F := f = ∞ i=1 f(ki )uki : f F := f(ki ) λki ∞ i=1 2 λk1 λk2 · · · > 0 λ affects convergence rate & tractability G := g = ∞ i=1 ^ g(ki )uki : g G := ^ g 2 , APP(f, n) = n i=1 f(ki )uki Cd,λ,n1,A := f ∈ F : f F A f(ki ) λki n1 i=1 2 pilot sample bounds the norm of the input A is inflation factor, n1 is initial sample size f − APP(f, n) G A2 f(ki ) λki n1 i=1 2 2 − f(ki ) λki n i=1 2 2 1/2 upper bound on f− n i=1 f(ki)uki F λkn+1 =: ERR f(ki ) n i=1 , n data-driven ALG(f, ε) = APP(f, n∗(f, ε)) for n∗(f, ε) = min{n ∈ N : ERR f(ki ) n i=1 , n ε} COST(ALG, Cd,λ,n1,A , ε, R) = max n∗(f, ε) : f ∈ Cλ,n1,A ∩ BR = ∩ = min n n1 : λkn+1 ε/[(A2 − 1)1/2R] ALG is essentially optimal; computational cost is d independent if λk decay quickly 14/18
Function Values as Information Goal: Construct ALG such that given a black box providing information about f : Ω ⊂ Rd → R f − ALG(f, ε) G ε ∀ε > 0, f ∈ H ⊆ F (Banach space) So far, the function information is series coefficients COST(f, ε) = O n∗(f, ε) $(f) , the best one can hope for Cost of constructing the approximation and determining the stopping sample size is essentially the same as getting the data But using series coefficients is not so realistic Developing theory for multivariate function approximation using function values is challenging One must bound the aliasing effects of using interpolation or other means to approximate the coefficients Interpolation, reproducing kernel Hilbert space methods, and kriging typically require O(n3) operations to compute approximation, perhaps more if one is tuning the parameters of the kernels; but there are efforts to speed this up3 Space filling designs such as integration lattices4, digital nets5, and sparse grids6 are promising 3SchEtal19. 4DicEtal14a. 5DicPil10a. 6BunGrie04a. 15/18
ALG such that given a black box providing information about f : Ω ⊂ Rd → R f − ALG(f, ε) G ε ∀ε > 0, f ∈ H ⊆ F (Banach space) Impossible for infinite dimensional Banach space H = F Smoothness assumed by F speeds up ALG Smoothness alone cannot save from the curse of dimensionality, but a low effective-dimension structure can Choosing H to be a cone , rather than a ball , paves the way for adaptive algorithms Interesting design (where to sample) problems remain 17/18