Arshak Minasyan

Recent advances in robust mean estimation in high dimensions Arshak
Minasyan CREST-ENSAE, IP Paris S3 – The Paris-Saclay Signal Seminar March 8, 2023 Based on joint works with Arnak Dalalyan (CREST-ENSAE, IP Paris), Amir-Hossein Bateni (Université Grenoble Alpes), Nikita Zhivotovskiy (University of California, Berkeley). arshak minasyan robust mean estimation in high dimensions 1

classical mean estimation setting We observe n vectors from Rd
such that X1, . . . , Xn i.i.d. ∼ Nd(µ∗, Σ) with an unknown values of µ∗, Σ. The celebrated Borell, Tsirelson-Ibragimov-Sudakov Gaussian concentration states that w.p. ≥ 1 − δ, it holds 1 n n i=1 Xi − µ∗ 2 ≤ Tr(Σ) n + 2∥Σ∥ log(1/δ) n . arshak minasyan robust mean estimation in high dimensions 2

such that X1, . . . , Xn i.i.d. ∼ Nd(µ∗, Σ) with an unknown values of µ∗, Σ. The celebrated Borell, Tsirelson-Ibragimov-Sudakov Gaussian concentration states that w.p. ≥ 1 − δ, it holds 1 n n i=1 Xi − µ∗ 2 ≤ Tr(Σ) n + 2∥Σ∥ log(1/δ) n . Question: What happens if some “small” fraction (denoted by ε) of data is contaminated? arshak minasyan robust mean estimation in high dimensions 2

such that X1, . . . , Xn i.i.d. ∼ Nd(µ∗, Σ) with an unknown values of µ∗, Σ. The celebrated Borell, Tsirelson-Ibragimov-Sudakov Gaussian concentration states that w.p. ≥ 1 − δ, it holds 1 n n i=1 Xi − µ∗ 2 ≤ Tr(Σ) n + 2∥Σ∥ log(1/δ) n . Question: What happens if some “small” fraction (denoted by ε) of data is contaminated? ▶ Statistical guarantees ▶ Computational aspects arshak minasyan robust mean estimation in high dimensions 2

contamination models MOF = P n µ : µ ∈
M . arshak minasyan robust mean estimation in high dimensions 3

M . MHC(ε) = (1 − ε)Pµ + εQ n : µ ∈ M, Q ∈ P , arshak minasyan robust mean estimation in high dimensions 3

M . MHC(ε) = (1 − ε)Pµ + εQ n : µ ∈ M, Q ∈ P , MPC(o) = Pµ1 ⊗ · · · ⊗ Pµn : µi ∈ M, ∃O ⊂ [n] s.t. |O| ≤ o, µi = µj ∀i, j ∈ Oc and µi ̸= µj, ∀i ∈ O and some j ∈ Oc . arshak minasyan robust mean estimation in high dimensions 3

M . MHC(ε) = (1 − ε)Pµ + εQ n : µ ∈ M, Q ∈ P , MPC(o) = Pµ1 ⊗ · · · ⊗ Pµn : µi ∈ M, ∃O ⊂ [n] s.t. |O| ≤ o, µi = µj ∀i, j ∈ Oc and µi ̸= µj, ∀i ∈ O and some j ∈ Oc . MAC(o) = σ(P n−o µ ⊗ P1 ⊗ · · · ⊗ Po) : µ ∈ M, σ ∈ Sn permutation group and P1, . . . , Po are arbitrary . arshak minasyan robust mean estimation in high dimensions 3

relation between contamination models MOF MHC MPC MAC arshak minasyan
robust mean estimation in high dimensions 4

(sub-)Gaussian adversarial contamination Definition (Gaussian Adversarial Contamination) Let Y1, .
. . , Yn i.i.d. ∼ Nd(µ∗, Σ) then the contaminated sample X1, . . . , Xn is distributed according to GAC(µ∗, Σ, ε) with ε ∈ (0, 1/2) when we have {i : Xi ̸= Yi} ≤ εn. arshak minasyan robust mean estimation in high dimensions 5

. . , Yn i.i.d. ∼ Nd(µ∗, Σ) then the contaminated sample X1, . . . , Xn is distributed according to GAC(µ∗, Σ, ε) with ε ∈ (0, 1/2) when we have {i : Xi ̸= Yi} ≤ εn. Definition (Sub-Gaussian Adversarial Contamination) Let ξ1, . . . , ξn ind. ∼ SGd(τ) then the contaminated sample X1, . . . , Xn is distributed according to SGAC(µ∗, Σ, ε, τ) when i := 1, . . . , n : Xi ̸= µ∗ + Σ1/2ξi ≤ εn. arshak minasyan robust mean estimation in high dimensions 5

. . , Yn i.i.d. ∼ Nd(µ∗, Σ) then the contaminated sample X1, . . . , Xn is distributed according to GAC(µ∗, Σ, ε) with ε ∈ (0, 1/2) when we have {i : Xi ̸= Yi} ≤ εn. Definition (Sub-Gaussian Adversarial Contamination) Let ξ1, . . . , ξn ind. ∼ SGd(τ) then the contaminated sample X1, . . . , Xn is distributed according to SGAC(µ∗, Σ, ε, τ) when i := 1, . . . , n : Xi ̸= µ∗ + Σ1/2ξi ≤ εn. Outliers: O = {i : Xi ̸= Yi} Inliers: I = {1, . . . , n} \ O. arshak minasyan robust mean estimation in high dimensions 5

basic estimators ▶ Sample mean: µn = 1 n n
i=1 Xi arshak minasyan robust mean estimation in high dimensions 6

i=1 Xi Not robust ! One point can make the risk arbitrarily large. arshak minasyan robust mean estimation in high dimensions 6

i=1 Xi Not robust ! One point can make the risk arbitrarily large. ▶ Coordinate-wise / geometric median: µn = med{X1, . . . , Xn}, µGM ∈ arg min µ∈Rd n i=1 ∥Xi − µ∥2. arshak minasyan robust mean estimation in high dimensions 6

i=1 Xi Not robust ! One point can make the risk arbitrarily large. ▶ Coordinate-wise / geometric median: µn = med{X1, . . . , Xn}, µGM ∈ arg min µ∈Rd n i=1 ∥Xi − µ∥2. Robust ! arshak minasyan robust mean estimation in high dimensions 6

i=1 Xi Not robust ! One point can make the risk arbitrarily large. ▶ Coordinate-wise / geometric median: µn = med{X1, . . . , Xn}, µGM ∈ arg min µ∈Rd n i=1 ∥Xi − µ∥2. Robust ! Optimal only in low dimensions. arshak minasyan robust mean estimation in high dimensions 6

i=1 Xi Not robust ! One point can make the risk arbitrarily large. ▶ Coordinate-wise / geometric median: µn = med{X1, . . . , Xn}, µGM ∈ arg min µ∈Rd n i=1 ∥Xi − µ∥2. Robust ! Optimal only in low dimensions. ▶ Tukey’s median: µTM n = arg max µ∈Rd Dn(µ), Dn(µ) = inf u∈Sd−1 n i=1 1(u⊤Xi ≤ u⊤µ) arshak minasyan robust mean estimation in high dimensions 6

i=1 Xi Not robust ! One point can make the risk arbitrarily large. ▶ Coordinate-wise / geometric median: µn = med{X1, . . . , Xn}, µGM ∈ arg min µ∈Rd n i=1 ∥Xi − µ∥2. Robust ! Optimal only in low dimensions. ▶ Tukey’s median: µTM n = arg max µ∈Rd Dn(µ), Dn(µ) = inf u∈Sd−1 n i=1 1(u⊤Xi ≤ u⊤µ) Robust & optimal only when the covariance is identical and takes exponential time to compute ! arshak minasyan robust mean estimation in high dimensions 6

median of means A simple estimator is median-of-means. Goes back
to Nemirovsky, Yudin (1983), Jerrum, Valiant, and Vazirani (1986), Alon, Matias, and Szegedy (2002). µMOM n,δ ≜ med 1 m m j=1 Xj, . . . , 1 m km j=(k−1)m+1 Xj . Let δ ∈ (0, 1), k = 8 log(1/δ) and m = n 8 log(1/δ) . Then, |µMOM n,δ − µ∗| ≤ σ 32 log(1/δ) n Further developments of the MOM approach including heavy-tailed distributions in high dimensions include: Depersin and Lecue (2021), Lugosi and Mendelson (2019), Minsker (2015), Hsu and Sabato (2013), etc. arshak minasyan robust mean estimation in high dimensions 7

trimmed mean Arguably the oldest and most natural idea of
robust statistics is the trimmed mean. Tukey and McLaughlin (1963), Huber and Ronchetti (1984), Bickel (1965), Stigler (1973). Divide the contaminated sample into two halves: X1, . . . , Xn; Y1, . . . , Yn. Let α = Y(εn) and β = Y((1−ε)n) be the empirical quantiles of the second half. Define µ2n = 1 n n i=1 ϕα,β(Xi), arshak minasyan robust mean estimation in high dimensions 8

trimmed mean Arguably the oldest and most natural idea of
robust statistics is the trimmed mean. Tukey and McLaughlin (1963), Huber and Ronchetti (1984), Bickel (1965), Stigler (1973). Divide the contaminated sample into two halves: X1, . . . , Xn; Y1, . . . , Yn. Let α = Y(εn) and β = Y((1−ε)n) be the empirical quantiles of the second half. Define µ2n = 1 n n i=1 ϕα,β(Xi), where (w.p. ≥ 1 − δ given nε ∼ log(1/δ)) ϕα,β(x) =      β, if x > β, x, if x ∈ [α, β], α, if , x < α. |µ2n − µ∗| ≤ 9σ log(8/δ) n . Lugosi and Mendelson (2021) showed the optimality of trimmed mean in high dimensions for distributions with finite two moments. arshak minasyan robust mean estimation in high dimensions 8

lower bound for Gaussian mean Combining the lower bound from
Chen et al. (2018) with the lower bound in the outlier-free regime we have inf µn ∥µn − µ∗∥2 ≥ c ∥Σ∥ rΣ n + ε holds with positive probability for some absolute constant c > 0, where rΣ ≜ Tr(Σ) ∥Σ∥ is usually called effective rank. Chen et al. (2018) also proved that Tukey’s median of Gaussians satisfies ∥µTM n − µ∗∥2 ≤ Cσ d + log(1/δ) n + ε w.p. ≥ 1 − δ when Σ = σ2Id. arshak minasyan robust mean estimation in high dimensions 9

lower bound for sub-Gaussian mean Lugosi and Mendelson (2021) showed
that in the family of sub-Gaussian distributions, it is indeed impossible to achieve a rate better than ∥Σ∥ rΣ n + ε log(1/ε) , i.e., inf µn ∥µn − µ∗∥2 ≥ c ∥Σ∥ rΣ n + ε log(1/ε) holds with positive probability. arshak minasyan robust mean estimation in high dimensions 10

optimal robust Gaussian mean estimation Theorem (M., Zhivotovskiy 2023+) Assume
X1, . . . , Xn ∼ GAC(µ∗, Σ, ε). Let ε < c1, then there is an estimator µn satisfying, with probability at least 1 − δ, ∥µn − µ∗∥2 ≤ c2 ∥Σ∥ rΣ n + log(1/δ) n + ε , where c1, c2 > 0 are some absolute constants. arshak minasyan robust mean estimation in high dimensions 11

optimal robust Gaussian mean estimation Theorem (M., Zhivotovskiy 2023+) Assume
X1, . . . , Xn ∼ GAC(µ∗, Σ, ε). Let ε < c1, then there is an estimator µn satisfying, with probability at least 1 − δ, ∥µn − µ∗∥2 ≤ c2 ∥Σ∥ rΣ n + log(1/δ) n + ε , where c1, c2 > 0 are some absolute constants. µn = arg min ν∈Rd sup v∈Sd−1 |Eρv Med(⟨X1, θ⟩, . . . , ⟨Xn, θ⟩) − ⟨ν, v⟩| , where θ ∼ Nd(v, β−1Id) and the parameter β satisfies rΣ /10 ≤ β ≤ 10rΣ . arshak minasyan robust mean estimation in high dimensions 11

tuning the parameter β The parameter β is chosen by
the Statistician in such a way that rΣ /10 ≤ β ≤ 10rΣ . arshak minasyan robust mean estimation in high dimensions 12

tuning the parameter β The parameter β is chosen by
the Statistician in such a way that rΣ /10 ≤ β ≤ 10rΣ . To estimate β we need to estimate both ∥Σ∥ and Tr(Σ). Abdalla and Zhivotovskiy (2022) provided an estimator ω such that ∥Σ∥/4 ≤ ω ≤ 4∥Σ∥. The estimation of Tr(Σ) reduces to mean estimation in R and using the estimator from Lugosi and Mendelson (2019) we have an estimator τ such that Tr(Σ)/2 ≤ τ ≤ 2Tr(Σ). Hence, this yields an estimator such that rΣ /8 ≤ τ ω ≤ 8rΣ . An alternative approach for estimating β would be using Lepskii’s method. arshak minasyan robust mean estimation in high dimensions 12

going beyond the Gaussian assumption Gaussian assumption does not play
a crucial role. Our results extend to distributions that are/have ▶ Symmetry around the mean and spherical symmetry properties (⟨X − µ, v⟩/ √ v⊤Σv for any v ∈ Sd−1 is independent of v). ▶ For small enough ε and some c > 0 |F−1(1/2 ± ε) − F−1(1/2)| ≤ cε. ▶ The density function f is separated from zero by an absolute constant for all x ∈ [F−1(1/2 − ε), F−1(1/2 + ε)]. ▶ Sub-Gaussian tails. arshak minasyan robust mean estimation in high dimensions 13

computational aspects of smoothed median Recall µn = arg min
ν∈Rd sup v∈Sd−1 |Eρv Med(⟨X1, θ⟩, . . . , ⟨Xn, θ⟩) − ⟨ν, v⟩| . arshak minasyan robust mean estimation in high dimensions 14

computational aspects of smoothed median Recall µn = arg min
ν∈Rd sup v∈Sd−1 |Eρv Med(⟨X1, θ⟩, . . . , ⟨Xn, θ⟩) − ⟨ν, v⟩| . ▶ It takes exponential time in d to compute the smoothed median of contaminated data X1, . . . , Xn. ▶ For mean estimation, it is a challenging open problem to have a polynomial time algorithm with a linear dependence on ε, even when Σ = Id. arshak minasyan robust mean estimation in high dimensions 14

iteratively reweighted mean estimator We consider the weighted sample mean
Xw = n i=1 wiXi Goal: Find a w to mimic w∗ j ∝ 1(j ∈ Oc) for all j ∈ {1, . . . , n}. arshak minasyan robust mean estimation in high dimensions 15

Xw = n i=1 wiXi Goal: Find a w to mimic w∗ j ∝ 1(j ∈ Oc) for all j ∈ {1, . . . , n}. Notice that Xw − µ∗ = ψw, where ψw = i∈I wi ∥wI∥1 ζi + 1 1 − εw i∈Ic wi(Xi − Xw) with εw = i∈Ic wi, and ζ1, . . . , ζn i.i.d. ∼ Nd(0, Σ). arshak minasyan robust mean estimation in high dimensions 15

Xw = n i=1 wiXi Goal: Find a w to mimic w∗ j ∝ 1(j ∈ Oc) for all j ∈ {1, . . . , n}. Notice that Xw − µ∗ = ψw, where ψw = i∈I wi ∥wI∥1 ζi + 1 1 − εw i∈Ic wi(Xi − Xw) with εw = i∈Ic wi, and ζ1, . . . , ζn i.i.d. ∼ Nd(0, Σ). ∥Xw − µ∗∥2 ≤ sup v∈B2 v⊤ψw. arshak minasyan robust mean estimation in high dimensions 15

iteratively reweighted mean estimator For any pair of vectors w
∈ ∆n−1 and µ ∈ Rd we proved ∥Xw − µ∗∥2 ≤ √ ε · G(w, µ)1/2 + R(ζ, I) G(w, µ) = λmax n i=1 wi(Xi − µ)⊗2 − Σ . arshak minasyan robust mean estimation in high dimensions 16

∈ ∆n−1 and µ ∈ Rd we proved ∥Xw − µ∗∥2 ≤ √ ε · G(w, µ)1/2 + R(ζ, I) G(w, µ) = λmax n i=1 wi(Xi − µ)⊗2 − Σ . ▶ G(w, µ) is indeed bi-convex in (w, µ) arshak minasyan robust mean estimation in high dimensions 16

∈ ∆n−1 and µ ∈ Rd we proved ∥Xw − µ∗∥2 ≤ √ ε · G(w, µ)1/2 + R(ζ, I) G(w, µ) = λmax n i=1 wi(Xi − µ)⊗2 − Σ . ▶ G(w, µ) is indeed bi-convex in (w, µ) ▶ For fixed value of µ minimizing G(w, µ) becomes an SDP arshak minasyan robust mean estimation in high dimensions 16

∈ ∆n−1 and µ ∈ Rd we proved ∥Xw − µ∗∥2 ≤ √ ε · G(w, µ)1/2 + R(ζ, I) G(w, µ) = λmax n i=1 wi(Xi − µ)⊗2 − Σ . ▶ G(w, µ) is indeed bi-convex in (w, µ) ▶ For fixed value of µ minimizing G(w, µ) becomes an SDP ▶ For fixed value of w ∈ ∆n−1 Xw = arg min µ∈Rn G(w, µ). arshak minasyan robust mean estimation in high dimensions 16

algorithm for known ε and Σ alg]algo:a1 Alg. 1: Iteratively
reweighted mean estimator Input: data X1 , . . . , Xn ∈ Rd, contamination rate ε and Σ Output: parameter estimate µIR n Initialize: compute µ0 as a minimizer of n i=1 ∥Xi − µ∥2 Set K = 0 ∨ log(4rΣ)−2 log(ε(1−2ε)) 2 log(1−2ε)−log ε−log(1−ε) . For k = 1 : K Compute current weights: w ∈ arg min (n−nε)∥w∥∞≤1 λmax n i=1 wi (Xi − µk−1)⊗2 − Σ . Update the estimator: µk = Xw . EndFor Return µIR n := µK. arshak minasyan robust mean estimation in high dimensions 17

in-expectation bound for Gaussians Theorem (M. & Dalalyan, 2022) Assume
that X1, . . . , Xn ∼ GAC(µ∗, Σ, ε). Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies ∥µIR n − µ∗∥ L2 ≤ 10∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) rΣ /n + ε log(1/ε) . arshak minasyan robust mean estimation in high dimensions 18

that X1, . . . , Xn ∼ GAC(µ∗, Σ, ε). Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies ∥µIR n − µ∗∥ L2 ≤ 10∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) rΣ /n + ε log(1/ε) . ▶ Small and explicit constant in front of the rate, compared to other estimator’s constants which are non-explicit or very large, e.g. around 107. arshak minasyan robust mean estimation in high dimensions 18

that X1, . . . , Xn ∼ GAC(µ∗, Σ, ε). Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies ∥µIR n − µ∗∥ L2 ≤ 10∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) rΣ /n + ε log(1/ε) . ▶ Small and explicit constant in front of the rate, compared to other estimator’s constants which are non-explicit or very large, e.g. around 107. ▶ Effective rank of the covariance matrix instead of the dimension. arshak minasyan robust mean estimation in high dimensions 18

in-probability bound for sub-Gaussians Theorem (M. & Dalalyan, 2022) Assume
that X1, . . . , Xn ∼ SGAC(µ∗, Σ, ε, τ). Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies, with probability at least 1 − δ, ∥µIR n − µ∗∥2 ≤ A(τ)∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) d + log(4/δ) n + ε log(1/ε) . arshak minasyan robust mean estimation in high dimensions 19

that X1, . . . , Xn ∼ SGAC(µ∗, Σ, ε, τ). Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies, with probability at least 1 − δ, ∥µIR n − µ∗∥2 ≤ A(τ)∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) d + log(4/δ) n + ε log(1/ε) . ▶ The constant A(τ) is not explicit and depends only on variance proxy τ. arshak minasyan robust mean estimation in high dimensions 19

that X1, . . . , Xn ∼ SGAC(µ∗, Σ, ε, τ). Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies, with probability at least 1 − δ, ∥µIR n − µ∗∥2 ≤ A(τ)∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) d + log(4/δ) n + ε log(1/ε) . ▶ The constant A(τ) is not explicit and depends only on variance proxy τ. ▶ The dependence on ε is optimal for sub-Gaussians. arshak minasyan robust mean estimation in high dimensions 19

unknown Σ Theorem (M. & Dalalyan, 2022) Assume that X1,
. . . , Xn ∼ SGAC(µ∗, Σ, ε, τ) with unknown Σ. Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies, with probability at least 1 − δ, ∥µIR n − µ∗∥2 ≤ A(τ)∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) p + log(1/δ) n + √ ε with probability at least 1 − δ. arshak minasyan robust mean estimation in high dimensions 20

. . . , Xn ∼ SGAC(µ∗, Σ, ε, τ) with unknown Σ. Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies, with probability at least 1 − δ, ∥µIR n − µ∗∥2 ≤ A(τ)∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) p + log(1/δ) n + √ ε with probability at least 1 − δ. ▶ For unknown isotropic Σ = σ2Id, the sub-Gaussian rates hold. arshak minasyan robust mean estimation in high dimensions 20

. . . , Xn ∼ SGAC(µ∗, Σ, ε, τ) with unknown Σ. Let ε < (5 − √ 5)/10 and µ∗ ∈ Rd, then the estimator µIR n satisfies, with probability at least 1 − δ, ∥µIR n − µ∗∥2 ≤ A(τ)∥Σ∥1/2 op 1 − 2ε − ε(1 − ε) p + log(1/δ) n + √ ε with probability at least 1 − δ. ▶ For unknown isotropic Σ = σ2Id, the sub-Gaussian rates hold. ▶ Among computationally tractable estimators the rate √ ε for general unknown Σ is the best known in the literature. arshak minasyan robust mean estimation in high dimensions 20

mean estimator based on spectral dimension reduction (SDR) arshak minasyan
robust mean estimation in high dimensions 21

in-probability bound for SDR estimator Theorem (Bateni, M., Dalalyan, 2022)
Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + log(1/δ) n with probability at least 1 − δ. arshak minasyan robust mean estimation in high dimensions 22

Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + log(1/δ) n with probability at least 1 − δ. ▶ The SDR algorithm is much faster than that based on iterative reweighting. arshak minasyan robust mean estimation in high dimensions 22

Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + log(1/δ) n with probability at least 1 − δ. ▶ The SDR algorithm is much faster than that based on iterative reweighting. ▶ Does not require the knowledge of ε. arshak minasyan robust mean estimation in high dimensions 22

Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + log(1/δ) n with probability at least 1 − δ. ▶ The SDR algorithm is much faster than that based on iterative reweighting. ▶ Does not require the knowledge of ε. ▶ Breakdown point is equal to 0.5. arshak minasyan robust mean estimation in high dimensions 22

Let X1, . . . , Xn ∼ GAC(µ∗, Σ, ε) and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + log(1/δ) n with probability at least 1 − δ. ▶ The SDR algorithm is much faster than that based on iterative reweighting. ▶ Does not require the knowledge of ε. ▶ Breakdown point is equal to 0.5. ▶ It has an additional factor √ log p. arshak minasyan robust mean estimation in high dimensions 22

unknown Σ Theorem (Bateni, M., Dalalyan, 2022) Let X1, .
. . , Xn ∼ GAC(µ∗, Σ, ε) with unknown Σ and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Assume that Σ satisfies ∥Σ−1/2ΣΣ−1/2 − Id∥ ≤ γ for some γ ∈ (0, 1/2]. Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ∥Σ∥1/2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + √ εγ + log(1/δ) n with probability at least 1 − δ. arshak minasyan robust mean estimation in high dimensions 23

. . , Xn ∼ GAC(µ∗, Σ, ε) with unknown Σ and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Assume that Σ satisfies ∥Σ−1/2ΣΣ−1/2 − Id∥ ≤ γ for some γ ∈ (0, 1/2]. Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ∥Σ∥1/2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + √ εγ + log(1/δ) n with probability at least 1 − δ. ▶ If γ is at most of order ε log(1/ε) then we have the same rate as if we knew Σ. arshak minasyan robust mean estimation in high dimensions 23

. . , Xn ∼ GAC(µ∗, Σ, ε) with unknown Σ and ε ≤ ε∗ ∈ (0, 1/2), and δ ∈ (0, 1/2). Assume that Σ satisfies ∥Σ−1/2ΣΣ−1/2 − Id∥ ≤ γ for some γ ∈ (0, 1/2]. Then, for appropriately chosen threshold t and some absolute constant C, the SDR estimator satisfies µSDR − µ∗ 2 ∥Σ∥1/2 ≤ C √ log p 1 − 2ε∗ rΣ n + ε log(2/ε) + √ εγ + log(1/δ) n with probability at least 1 − δ. ▶ If γ is at most of order ε log(1/ε) then we have the same rate as if we knew Σ. ▶ If γ is of constant order then we recover the best-known rate for computationally tractable estimators, i.e., rΣ /n + √ ε. arshak minasyan robust mean estimation in high dimensions 23

comparison Source: Bateni, M., Dalalyan (2022) arshak minasyan robust mean
estimation in high dimensions 24

references A. Dalalyan and A. Minasyan. All-in-one robust estimator of
the Gaussian mean. Annals of Statistics, 2022. A.-H. Bateni, A. Minasyan, A. Dalalyan. Nearly minimax robust estimator of the mean vector by iterative spectral dimension reduction. arXiv preprint arXiv:2204.02323, 2022. A. Minasyan and N. Zhivotovskiy. Statistically optimal robust mean and covariance estimation for anisotropic Gaussians. arXiv preprint arXiv:2301.09024, 2023. arshak minasyan robust mean estimation in high dimensions 25

Arshak Minasyan

Arshak Minasyan

More Decks by S³ Seminar

Other Decks in Research

Featured

Transcript