is scaling problem? • Matrix scaling and operator scaling with applications Part II: information geometry and scaling • My recent work (with Takeru Matsuda) • Operator scaling and quantum information geometry /
positive diagonal matrices L ∈ Rm ++ , R ∈ Rn ++ s.t. (LAR) n = m m and (LAR) m = n n Applications • Markov chain estimation [Richard Sinkhorn, 6 ] • Contingency table analysis [Morioka and Tsuda, ] • Optimal transport [Peyré and Cuturi, ] /
m = n/n. A( ) = A A( k+ ) = m Diag(A( k) n)− A( k), A( k+ ) = n A( k+ ) Diag((A( k+ ) m)− . Theorem (Richard Sinkhorn ( 6 )) If A is a positive matrix, then there exists a solution and A(k) converges to a solution. /
statistcs 6 s Stone economics (RAS method) Sinkhorn formulation of “matrix scaling” Knopp Sinkhorn algorithm s Csiszár information theory s Wigderson et al. computer science s Cuturi machine learning /
min P≥O C, P s.t. P n = m m , P m = n n /m /n /m /n /m /n Cij i j Entropic regularization [Wilson, 6 ] min P≥O C, P − i,j Pij log(Pij) s.t. P n = m m , P m = n n −→ The optimal solution is unique and in the form of P = LAR, where L, R: nonnegative diagonal and Aij = exp(−Cij/ ) matrix scaling! • Heavily used in ML (Wasserstein distance) [Peyré and Cuturi, ] /
> Find: positive diagonal matrices L ∈ Rm ++ , R ∈ Rn ++ s.t. (L AR ) n − m m < and (L AR ) m − n n < • A is approx scalable def ⇐⇒ has solution for ∀ > . /
. . . .8 . . a b c support graph a b c perfect matching Theorem (R. Sinkhorn and Knopp ( 6 )) A is approx scalable ⇐⇒ support graph has perfect mathing Linear algebra Graph theory /
Cm×m is completely positive (CP) if Φ(X) = k i= AiXA† i for some A , ... , Ak ∈ Cm×n. • The dual map of the above CP map is Φ∗(X) = k i= A† i XAi • For nonsingular Hermitian matrices L, R, the scaled map ΦL,R is ΦL,R(X) = LΦ(R†XR)L† 6 /
Output: nonsingular Hermitian matrices L, R s.t. ΦL,R(In) = Im m and Φ∗ L,R (Im) = In n . Note: Changed constants from Gurvits’ original “doubly stochastic” formulation /
> Find: nonsingular Hermitian matrices L , R s.t. ΦL ,R (In) − Im m < and Φ∗ L ,R (Im) − In n < . • Φ is approx scalable def ⇐⇒ has solution for ∀ > . 8 /
, define Aij = √ aij ei e† j for i, j and CP map Φ(X) = i,j AijXA† ij = i,j aij ei e† j Xej e† i . Then, Φ(I) = i,j aij ei e† j ej e† i = i j aij ei e† i = Diag(A n) Φ∗(I) = i,j aij ej e† i ei e† j = j i aij ej e† j = Diag(A m) /
In/n. Φ( ) = Φ Φ( k+ ) = Φ( k) L,In where L = √ m Φ( k) (In)− / , Φ( k+ ) = Φ( k+ ) Im,R where R = √ n (Φ( k+ ))∗(Im)− / . Theorem (Gurvits ( )) If Φ is approximately scalable, Φ(k) convergences to a solution. /
· · + xk Ak (A , ... , Ak : matrices, x , ... , xk : scalar variables) Determine: det(A) = ? • If one can use randomness, it is easy! Can we do deterministically? −→ P vs BPP (major open problem in complexity theory) • Deep connection to combinatorial optimization and complexity theory [Edmonds, 6 ; Lovász, 8 ; Murota, ] • For some wide class of A (and over C), one can solve it by operator scaling [Gurvits, ] /
x x x x x x a b c support graph a b c perfect matching Theorem (Edmonds ( 6 )) Assume that A , ... , Ak have only one nonzero entries. Then, det(A) ≠ ⇐⇒ support graph has perfect mathing Linear algebra Graph theory /
x A + · · · + xk Ak , rank A ≤ min {n − dim U + dim( i AiU) : U ≤ Cn subspace} • If A , ... , Ak are rank-one, the equality holds. Furthermore, one can compute rank A and minimizer U by linear matroid intersection. • Lovász ( 8 ) also gave several families of A s.t. one can compute rank A via combinatorial optimization. Linear algebra Combinatorial optimization /
x A + · · · + xk Ak , consider CP map Φ(X) = k i= Ak XA† k . Then, [∃X O s.t. rank Φ(X) < rank X] =⇒ det A = Furtheremore, the former condition can be efficiently checked by operator scaling. • Gurvits ( ) gave several classes of A s.t. the converse is also true. /
• noncommutative rank (nc-rank) is the rank of A = A x + · · · + Ak xk where xi are noncommutative variables (i.e., xixj ≠ xjxi) • Highly nontrivial to define! (needs deep algebra, e.g. free-skew fields, matrix ring, ...) Theorem (Amitsur ( 66), Cohn ( ), Gurvits ( ), and Garg et al. ( )) nc-rank of A = min{n − dim U + dim( i AiU) : U ≤ Cn subspace} Furthermore, nc-rank can be computed via operator scaling. Noncommutative Edmonds problem is in P!! /
): tuple of matrices, Bi: ni × n, p = (p , ... , pk ) ∈ Rk ++ Theorem (Brascamp and Lieb ( 6)) There exists C ∈ ( , +∞] s.t. ∫ Rn k i= (fi(Bix))pi dx ≤ C · k i= ∫ Rni (fi(Bixi))dxi pi for all integrable nonnegative functions f , ... , fk . Includes Hölder, Loomis–Whitney, etc. 6 /
... , Bk ) and p, let BL(B, p) be the smallest C s.t. BL-inequality holds. Theorem (Lieb ( )) BL(B, p) = sup X ,...,Xk O k i= det(Xi) det( m i= piB† i XiBi) / BL(B, p) can be computed by operator scaling [Garg et al., 8] /