プレ・コントラスト関数の幾何学

Geometry of Pre-contrast Functions Hiroshi Matsuzoe Nagoya Institute of Technology
1 Statistical manifolds and contrast functions 2 Affine immersions 3 Centroaffine immersions of codimension two 4 Quasi statistical manifolds and pre-contrast functions 5 Affine distributions 6 Generalized Pythagorean theorem ྔࢠͱݹయͷ෺ཧͱزԿ 2022 ʢܰྔ൛ʣ

Geometric pre-divergences 2 Half a Century of Information Geometry (IG)
(40-years of duality in IG) • Information geometry is an interdisciplinary field that ap- plies the techniques of differential geometry to study probability theory and statistics. It studies statistical manifolds, which are Riemannian manifolds whose points cor- respond to probability distributions. (Wikipedia (Engslish Ver. 2022 Jan. 27)) • In a narrow sense, it refers to the differential geometry of dual (conjugate) affine connections. (Fujiwara (2015) (Wikipedia (Japanese Ver. 2021 Oct. 4))) • (I cannot read Spanish version, but the explanation seems to focus more on the differential geometry of estimation.)

(40-years of duality in IG) Geometry of statistical models • Hotelling (1930) Rao (1945) The Fisher information matrix is a Riemannian metric. Conjugate connections • Blaschke (around 1920) Aﬃne surface theory • Norden (1937, 1945) Sen (1944—1946) Parallel translations w.r.t. a pair of aﬃne connections preserve a Riemmanian metric. Chentsov (1972) (in Russian) “Statistical Decision Rules and Optimal Inference” Geometry of statistical models. — Invariance criterion — Non Levi-Civita connections — Category of statistical decisions Western statisticians did not know his result.

(40-years of duality in IG) Geometry of statistical models Conjugate connections Chentsov (1972) (in Russian) “Statistical Decision Rules and Optimal Inference” — Invariance criterion — Non Levi-Civita connections — Category of statistical decisions • Efron (1975) pointed out the importance of curvature in statistical inference, and David pointed out the connection is not Levi-Civita. Nagaoka-Amari (1982) (unpublished technical report) “Differential geometry of smooth families of probability distributions” — re-discovered dual (conjugate) affine connections — applied dual affine connections to geometry of statistical models After (re-)discovery of dual affine connections, IG has been applied vari- ous fields of mathematical sciences.

Geometric pre-divergences 5 Generalized Pythagorian theorem (cf. Nagaoka-Amari (1982), Chentsov
(1968, 1972), Csisz´ ar (1975)) (M, g, ∇, ∇∗) : a dually ﬂat space D : M × M → [0, ∞) : ∇-divergence on M (a non-symmetric squred distance like function) γ1 : ∇-geodesic connecting p and q (p, q, r ∈ M) γ2 : ∇∗-geodesic connecting q and r γ1 ⊥ γ2 at q with respect to g =⇒ D(p, r) = D(p, q) + D(q, r) p q r ׏- geodesic ׏כ-geodesic

Geometric pre-divergences 6 Generalized projection theorem (cf. Nagaoka-Amari (1982), Chentsov
(1968, 1972), Csisz´ ar (1975)) (M, g, ∇, ∇∗) : a dually ﬂat space D : M × M → [0, ∞) ∇-divergence on M S ⊂ M : ∇∗-autoparallel (∇∗-totally geodesic) submanifold in M. γ : ∇-geodesic connecting p and q (p ∈ M\S, q ∈ S) Then γ ⊥ S at q with respect to g ⇐⇒ D(p, q) = min r∈S D(p, r)

Geometric pre-divergences 7 1 Statistical manifolds and contrast functions We
assume that all objects such as manifolds, Riemannian metrics, etc. are smooth in this talk. 1.1 Statistical manifolds M : a manifold (an open domain in Rn) h : a (semi-)Riemannian metric on M ∇ : an aﬃne connection on M Deﬁnition 1.1 ∇∗: the dual (or conjugate) connection of ∇ with respect to h def ⇐⇒ Xh(Y, Z) = h(∇X Y, Z) + h(Y, ∇∗ X Z) for X, Y, Z ∈ X(M) (1) (∇∗)∗ = ∇ (2) ∇(0) := 1 2 (∇ + ∇∗) =⇒ ∇(0)h = 0

Geometric pre-divergences 8 The torsion tensor T ∇ of ∇:
T ∇(X, Y ) = ∇X Y − ∇Y X − [X, Y ] Proposition 1.2 Consider the following four conditions: (1) ∇ is torsion-free. (2) ∇∗ is torsion-free. (3) C := ∇h is totally symmetric. (4) ∇(0) := (∇ + ∇∗)/2 is the Levi-Civita connection w.r.t h. Any two conditions imply the rests. Deﬁnition 1.3 (Kurose (1994)) Let (M, h) be a semi-Riemannian manifold with an aﬃne connection ∇ on M. (M, ∇, h) : a statistical manifold def ⇐⇒ (1) ∇ is torsion-free (2) (∇X h)(Y, Z) = (∇Y h)(X, Z) C(X, Y, Z) := (∇X h)(Y, Z) is the cubic form of (M, ∇, h)

Geometric pre-divergences 9 Proposition 1.4 (M, h) :
a semi-Riemannian manifold ∇(0) : the Levi-Civita connection with respect to h C : a totally symmetric (0, 3)-tensor field h(∇X Y, Z) := h(∇(0) X Y, Z) − 1 2 C(X, Y, Z), h(∇∗ X Y, Z) := h(∇(0) X Y, Z) + 1 2 C(X, Y, Z) =⇒ (1) ∇ and ∇∗ are affine connections, torsion-free and mutually dual with respect to h. (2) ∇h and ∇∗h are totally symmetric. (3) (M, ∇, h) and (M, ∇∗, h) are statistical manifolds. Remark 1.5 (Original definition by S.L. Lauritzen) (M, g) : a Riemannian manifold C : a totally symmetric (0, 3)-tensor field We call the triplet (M, g, C) a statistical manifold.

Geometric pre-divergences 10 Remark 1.6 (Kurose (1990) (cf.
Fujiwara (2015))) (M, g) : a Riemannian manifold ∇, ∇∗ : a pair of conjugate connections We call the quadruplet (M, g, ∇, ∇∗) a statistical manifold. The curvature tensor R∇ of ∇: R∇(X, Y )Z = ∇X ∇Y Z − ∇Y ∇X Z − ∇[X,Y ] Z h(R∇(X, Y )Z, W ) + h(Z, R∇∗ (X, Y )W ) = 0 This equation implies that R∇ = 0 ⇐⇒ R∇∗ = 0. Definition 1.7 Let (M, ∇, h) be a statistical manifold. (M, ∇, h) is a Hessian manifold def ⇐⇒ R∇ = 0 and T ∇ = 0, i.e., ∇ is flat. ⇐⇒ (M, h, ∇, ∇∗) is a dually flat space.

Geometric pre-divergences 11 Example 1.8 (Normal distributions) For x ∈
R, set pξ (x) = 1 √ 2π(ξ2)2 exp [ − (x − ξ1)2 2(ξ2) ] = 1 √ 2πσ2 exp [ − (x − µ)2 2σ2 ] S = {pξ | ξ = (ξ1, ξ2) = (µ, σ) ∈ R × (0, ∞)} Define the evaluation map by ex : S → R, ex (p) := p(x). Set l := log ◦ex gF p (X, Y ) := ∫ ∞ −∞ Xp (log ◦ex )Yp (log ◦ex )p(x)dx = Ep [(Xp l)(Yp l)] ( gF ij (pξ ) ) = − 1 (ξ2)2 ( 1 0 0 2 ) the Fisher metric Cp (X, Y, Z) := Ep [(Xp l)(Yp l)(Zp l)] the cubic form, Amari-Chentsov tensor field Denote by ∇(0) the Levi-Civita connection with respect to g. We define the α-connection (α ∈ R) on S by gF (∇(α) X Y, Z) := gF (∇(0) X Y, Z) − α 2 C(X, Y, Z) (S, ∇(α), g) is a statistical manifold. In particular, (S, ∇(1), gF ) and (S, ∇(−1), gF ) are Hessian manifolds, and thus (S, gF , ∇(1), ∇(−1)) is a dually flat space

Geometric pre-divergences 12 A statistical model Se is an exponential
family def ⇐⇒ Se = {pθ | θ ∈ Θ ⊂ Rn (an open subset)} where pθ (x) = exp[Z(x) + n ∑ i=1 θiFi (x) − ψ(θ)] Z, F1 , · · · , Fn : functions on X ψ : a function on the parameter space Θ The coordinate system [θi] is called the natural parameters. Proposition 1.9 For an exponential family Se , (1) ∇(e) := ∇(1) and ∇(m) := ∇(−1) are flat. (2) [θi] is a ∇(e)-affine coordinate, i.e., Γ(e) k ij ≡ 0. (3) Set ηi (p) = Ep [Fi ]. Then [ηi ] is a ∇(m)-affine coordinate, Γ(m) k ij ≡ 0. gF ij (θ) = E[(∂i (log ◦ex ))(∂j (log ◦ex ))] = ∂i ∂j ψ(θ) : the Fisher metric CF ijk (θ) = E[(∂i (log ◦ex ))(∂j (log ◦ex ))(∂k (log ◦ex ))] = ∂i ∂j ∂k ψ(θ) : the cubic form, Amari-Chentsov tensor field The triplets (Se , ∇(e), gF ) and (Se , ∇(m), gF ) are Hessian manifolds, and thus (Se , gF , ∇(e), ∇(m)) is a dually flat space.

Geometric pre-divergences 13 1.2 Conformal-projective geometry Deﬁnition 1.10
Two statistical manifolds (M, ∇, h), (M, ¯ ∇, ¯ h) are (1) conformally-projectively equivalent def ⇐⇒ There exist two functions ϕ and ψ on M such that ¯ h(X, Y ) = eϕ+ψh(X, Y ), ¯ ∇X Y = ∇X Y − h(X, Y )gradh ψ + dϕ(Y ) X + dϕ(X) Y (2) 1-conformally equivalent def ⇐⇒ There exist a function ψ on M such that ¯ h(X, Y ) = eψh(X, Y ), ¯ ∇X Y = ∇X Y − h(X, Y )gradh ψ where gradh ψ is deﬁned by h(X, gradh ψ) = dψ(X). Two connections ∇ and ¯ ∇ are dual-projectively equivalent. (3) (−1)-conformally equivalent def ⇐⇒ There exist a function ϕ on M such that ¯ h(X, Y ) = eϕh(X, Y ), ¯ ∇X Y = ∇X Y + dϕ(Y ) X + dϕ(X) Y Two connections ∇ and ¯ ∇ are projectively equivalent.

Geometric pre-divergences 14 Remark 1.11 (In the case ϕ =
ψ =⇒ 0-conformally equivalent) (M, g), (M, ¯ g) : Riemannian manifolds ∇(0), ¯ ∇(0) : their Levi-Civita connections =⇒ ¯ ∇(0) X Y = ∇(0) X Y − h(X, Y )gradh ϕ + dϕ(Y ) X + dϕ(X) Y Proposition 1.12 (M, ∇, h) and (M, ¯ ∇, ¯ h) are 1-conformally equivalent ⇐⇒ (M, ∇∗, h) and (M, ¯ ∇∗, ¯ h) are (−1)-conformally equivalent. Definition 1.13 (M, ∇, h) is conformally-projectively flat (resp. 1-conformally flat, (−1)-conformally flat) def ⇐⇒ (M, ∇, h) is locally confrmally-projectively equivalent (resp. 1-conformally euivalent, (−1)-conformally equivalent) to some Hessian manifold. That is, for each point in M, ∃U ⊂ M : a neighborhood, ∃(U, ¯ ∇, ¯ h) : a Hessian manifold such that (U, ∇|U , h|U ) and (U, ¯ ∇, ¯ h) are conformally-projectively equivalent.

Geometric pre-divergences 15 Conformal-projective geometry and umbilical points (M, ∇,
h) : a statistical manifold, n ≥ 3 N : a submanifold of M n : the unit normal vector along N ∇′ : the induced connection h′ : the induced metric (N, ∇′, h′) is a statistical submanifold. ∇X Y = ∇′ X Y + α(X, Y )n ∇X ν = −β#(X) + τ(X)n Set β(X, Y ) = h′(β#(X), Y ). Deﬁnition 1.14 p ∈ N p : a tangentially umbilical point of N in (M, ∇, h) def ⇐⇒ ∃c : αp = ch′ p p : a normally umbilical point of N in (M, ∇, h) def ⇐⇒ ∃c : βp = ch′ p

Geometric pre-divergences 16 Theorem 1.15 (Kurose ’02) (M,
∇, h) and (M, ¯ ∇, ¯ h) : simply connected statistical manifolds, dim M = n ≥ 3. (M, ∇, h) and (M, ¯ ∇, ¯ h) are conformally-projectively equivalent ⇐⇒ (1) Ric(X, Y ) − Ric(Y, X) = Ric(X, Y ) − Ric(Y, X) (2) (∇, h) → ( ¯ ∇, ¯ h) preserves the tangentially umbilical points and the normally umbilical points of any hypersurface of M.

Geometric pre-divergences 18 Proposition 1.17 For a contrast
function D on M, we define ∇ and ∇∗ by h(∇X Y, Z) := −D[XY |Z], h(Y, ∇∗ X Z) := −D[Y |XZ] =⇒ (1) ∇, ∇∗ : are torsion-free affine connections on M, mutually dual with respect to h. (2) ∇h, ∇∗h : symmetric (0, 3)-tensor fields. (3) (M, ∇, h), (M, ∇∗, h) : statistical manifolds. Example 1.18 For M = Rn, set D(p, q) := 1 2 ||p − q||2, (p, q) ∈ Rn × Rn. Then D is a contrast function on Rn. The contrast function D induces the standard Euclidean metric gE and the standard flat affine connection ∇E on Rn. That is, (Rn, ∇E, gE) is a Hessian manifold.

Geometric pre-divergences 19 Example 1.19 S = {p(x; θ)}: a
statistical model Kullback-Leibler divergence, relative entropy DKL (p(x; θ), p(x; θ′)) = ∫ p(x; θ) log p(x; θ) p(x; θ′) dx = Eθ [log p(x; θ) − log p(x; θ′)] . DKL [∂i |∂j ] = − ∫ ∂i p(θ)∂′ j log p(θ′)dx θ=θ′ ( ∂i = ∂ ∂θi , ∂′ j = ∂ ∂θ′j ) = − ∫ ∂i log p(θ)∂′ j log p(θ′)p(θ)dx θ=θ′ = −gF ij the Fisher metric DKL [∂i ∂j |∂k ] = − ∫ ( ∂i ∂j l(θ)∂′ k l(θ′) + ∂i l(θ)∂j l(θ)∂′ k l(θ′) ) p(θ)dx θ=θ′ = −Γ(m) ij,k the mixture connection The KL-divergence induces the invariant statistical manifold (S, ∇(m), gF ).

Geometric pre-divergences 20 Definition 1.20 For a contrast
function D on M, denote by h h : the induced semi-Riemannian metric, ∇, ∇∗ : the induced connections. We define (0, 4)-tensor fields B and B∗ by h(B(X, Y )Z, V ) := −D[XY Z|V ] + D[∇X ∇Y Z|V ], h(V, B∗(X, Y )Z) := −D[V |XY Z] + D[V |∇∗ X ∇∗ Y Z]. We call B the Bartlett tensor field on M, and B∗ the dual Bartlett tensor field on M. Proposition 1.21 (Eguchi 1993) R, R∗ : the curvature tensors of ∇, ∇∗, respectively. =⇒ R(X, Y )Z = B(Y, X)Z − B(X, Y )Z, R∗(X, Y )Z = B∗(Y, X)Z − B∗(X, Y )Z.

Geometric pre-divergences 21 Proposition 1.22 D, ˜ D
: contrast functions on M (M, ∇, h), (M, ˜ ∇, ˜ h) : induced statistical manifolds ϕ, ψ : functions on M. (1) ˜ D(p, q) = eψ(p)+ϕ(q)D(p, q) =⇒ (M, ∇, h) and (M, ˜ ∇, ˜ h) are conformally-projectively equivalent. (2) ˜ D(p, q) = eψ(q)D(p, q) =⇒ (M, ∇, h) and (M, ˜ ∇, ˜ h) are 1-conformally equivalent. (3) ˜ D(p, q) = eϕ(p)D(p, q) =⇒ (M, ∇, h) and (M, ˜ ∇, ˜ h) are (−1)-conformally equivalent.

Geometric pre-divergences 22 2 Affine immersions f : M →
Rn+1: an immersion (dim M = n ≥ 2) ξ: a local vector field along f Definition 2.1 {f, ξ} : M → Rn+1 is an affine immersion def ⇐⇒ For an arbitrary point p ∈ M, Tf(p) Rn+1 = f∗ (Tp M) ⊕ R{ξp } ξ: a transversal vector field D: the standard flat affine connection on Rn+1 DX f∗ Y = f∗ (∇X Y ) + h(X, Y )ξ, DX ξ = −f∗ (SX) + τ(X)ξ. Proposition 2.2 {f, ξ}, { ¯ f, ¯ ξ} : affine immersions Suppose ∇ = ¯ ∇, h = ¯ h, S = ¯ S, τ = ¯ τ =⇒ ∃A ∈ GL(n + 1, R) and ∃b ∈ Rn+1 s.t. ¯ f = Af + b, ¯ ξ = Aξ

Geometric pre-divergences 23 D: the standard flat affine connection on
Rn+1 DX f∗ Y = f∗ (∇X Y ) + h(X, Y )ξ, DX ξ = −f∗ (SX) + τ(X)ξ. ∇ : the induced connection h : the affine fundamental form S : the affine shape operator τ : the transversal connection form f : non-degenerate def ⇐⇒ h : non-degenerate {f, ξ} : equiaffine def ⇐⇒ τ = 0 w : the induced volume element (n-form) with respect to {f, ξ} def ⇐⇒ w(X1 , . . . , Xn ) := det(f∗ X1 , . . . , f∗ Xn , ξ), where “det” is the standard volume element on Rn+1. ∇, τ, w : induced objects from {f, ξ} =⇒ (∇Y w)(X1 , . . . , Xn ) = τ(Y )w(X1 , . . . , Xn ) τ = 0 ⇐⇒ v is parallel with respect to ∇.

Geometric pre-divergences 24 Fundamental structural equations for affine immersions Gauss:
R(X, Y )Z = h(Y, Z)SX − h(X, Z)SY Codazzi: (∇X h)(Y, Z) + τ(X)h(Y, Z) = (∇Y h)(X, Z) + τ(Y )h(X, Z) (∇X S)(Y ) − τ(X)SY = (∇Y S)(X) − τ(Y )SX Ricci: h(X, SY ) − h(Y, SX) = (∇X τ)(Y ) − (∇Y τ)(X) Proposition 2.3 M : simply connected. ∇ : a torsion-free affine connection, S : a (1, 1)-tensor field h : a semi-Riemannian metric τ : a 1-form ∇, h, S and τ satisfy above equations =⇒ There exists an affine immersion {f, ξ} which induces given ∇, h, S and τ. Proposition 2.4 {f, ξ} : non-degenerate, =⇒ (M, ∇, h) is a statistical manifold, equiaffine 1-conoformally flat. If M is simply connected, the converse also holds.

Geometric pre-divergences 25 U : an open domain of Rn
ψ : a funcion on U We say that {f, ξ} : U → Rn+1 is a graph immersion def ⇐⇒ f :   θ1 . . . θn   →     θ1 . . . θn ψ(θ)     , ξ =     0 . . . 0 1     Set ∂i = ∂ ∂θi , ψij = ∂2ψ ∂θi∂θj . Then we have D∂i f∗ ∂j = ψij ξ. This implies that ∇ is ﬂat and {θi} is the ∇-aﬃne coordinate system. Proposition 2.5 (U, ∇, h) : a Hessian manifold ψ : a potential function of h =⇒ (U, ∇, h) can be realized in Rn+1 by a graph immersion with potential ψ.

Geometric pre-divergences 26 {f, ξ} : nondegenerate, equiaffine Rn+1 :
the dual space of Rn+1 (≃ Tf(p) Rn+1 (∀p ∈ M)) ⟨ , ⟩ : the canonical pairing of Rn+1 and Rn+1. v : M → Rn+1 is the conormal map of {f, ξ} def ⇐⇒ ⟨v(p), ξp ⟩ = 1, ⟨v(p), f∗ Xp ⟩ = 0 Geometric divergence DG : M × M → R : the geometric divergence of (M, ∇, h). def ⇐⇒ DG (p, q) = ⟨v(q), f(p) − f(q)⟩. cf. affine support function DA : Rn+1 × M → R: DA (x, q) = ⟨v(q), x − f(q)⟩ • DG s independent of realization of (M, ∇, h). • DG is a contrast function on M. Remark 2.6 The geometric divergence is globally defined, whereas most divergence is locally defined.

Geometric pre-divergences 27 (U, ∇, h) : a simply connected
Hessian manifold, U ⊂ Rn : an open domain ( =⇒ (U, h, ∇, ∇∗) is a dually ﬂat space.) =⇒ ∃ψ : a funcion on U (potential function) such that ∂2ψ ∂θi∂θj = gij =⇒ {f, ξ} : an aﬃne immersion (graph immersion) f :   θ1 . . . θn   →     θ1 . . . θn ψ(θ)     , ξ =     0 . . . 0 1     v : the conormal map of {f, ξ}, v = (−η1 , . . . , −ηn , 1) ηi = ∂ψ ∂θi By setting ϕ(q) = ∑ ηi (q)θi(q) − ψ(q), we have DG (p, q) = ⟨v(q), f(p) − f(q)⟩ = − ∑ ηi (q)θi(p) + ψ(p) + ∑ ηi (q)θi(q) − ψ(q) = ψ(p) + ϕ(q) − ∑ ηi (q)θi(p) = D(p, q)

Geometric pre-divergences 28 3 Centroaffine immersions of codimension two M:
an n-dimensional manifold f : M → Rn+2: an immersion ξ: a local vector field along f Definition 3.1 {f, ξ} : M → Rn+2 is a centroaffine immersions of codimension two def ⇐⇒ For an arbitrary point p ∈ M, Tf(p) Rn+2 = f∗ (Tp M) ⊕ R{ξp } ⊕ R{f(p)} ξ: a transversal vector field D: the standard flat affine connection on Rn+2 DX f∗ Y = f∗ (∇X Y ) + h(X, Y )ξ + k(X, Y )f, DX ξ = −f∗ (SX) + τ(X)ξ + υ(X)f.

Geometric pre-divergences 29 ∇ : the induced connection h :
the affine fundamental form τ : the transversal connection form S : the affine shape operator w(X1 , · · · , Xn ) := det(f∗ X1 , · · · , f∗ Xn , ξ, f) the induced volume element Proposition 3.2 ∇X w = τ(X)w Definition 3.3 f : non-degenerate def ⇐⇒ h : non-degenerate {f, ξ} : equiaffine def ⇐⇒ τ = 0 Proposition 3.4 {f, ξ} : M → Rn+2 : non-degenerate, equiaffine =⇒ (M, ∇, h) is a statistical manifold, conformally-projectively flat

Geometric pre-divergences 30 Duality of affine immersions Rn+2 : the
dual vector space of Rn+2 (≃ Tf(p) Rn+2 (∀p ∈ M)) ⟨ , ⟩ : the pairing of Rn+2 and Rn+2 Definition 3.5 v, ξ∗ : M → Rn+2 def ⇐⇒ ⟨v(p), ξp ⟩ = 1 ⟨ξ∗(p), ξp ⟩ = 0, ⟨v(p), f(p)⟩ = 0 ⟨ξ∗(p), f(p)⟩ = 1, ⟨v(p), f∗ Xp ⟩ = 0 ⟨ξ∗(p), f∗ Xp ⟩ = 0, We call v the conormal map of {f, ξ} If h is non-degenerate =⇒ {v, ξ∗} : M → Rn+2 is a centroaffine immersion of codimension two. We call {v, ξ∗} the dual map of {f, ξ}. Proposition 3.6 {f, ξ} induces (M, ∇, h) ⇐⇒ {v, ξ∗} induces (M, ∇∗, h).

Geometric pre-divergences 31 Definition 3.5 v, ξ∗ :
M → Rn+2 : the dual map of {f, ξ}. def ⇐⇒ ⟨v(p), ξp ⟩ = 1 ⟨ξ∗(p), ξp ⟩ = 0, ⟨v(p), f(p)⟩ = 0 ⟨ξ∗(p), f(p)⟩ = 1, ⟨v(p), f∗ Xp ⟩ = 0 ⟨ξ∗(p), f∗ Xp ⟩ = 0, Definition 3.7 DG : M × M → R : the geometric divergence def ⇐⇒ DG (p, q) = ⟨v(q), f(p) − f(q)⟩ The geometric divergence DG is a contrast function, and this is a special form of an affine support function.

Geometric pre-divergences 32 Legendre transformation Proposition 3.8 (M, g, ∇,
∇∗) : a dually flat space {θi} : a ∇-affine coordinate system {ηi} : a ∇∗-affine coordinate system =⇒ ∂ψ ∂θi = ηi , ∂ϕ ∂ηi = θi, ∂2ψ ∂θi∂θj = gij , ∂2ϕ ∂ηi∂ηj = gij, g ( ∂ ∂θi , ∂ ∂ηj ) = { 1 (i = j) 0 (i ̸= j), ψ(p) + ϕ(p) − n ∑ i=1 θi(p)ηi (p) = 0, (M, ∇, g) and (M, ∇∗, g) are flat statistical manifolds.

Geometric pre-divergences 33 f =     
 θ1 . . . θn ψ 1       , ξ =       0 . . . 0 1 0       =⇒ f∗ ∂ ∂θn = ∂f ∂θn =       0 . . . 1 ∂ψ ∂θn 0       v = (−η1 , . . . , −ηn , 1, ϕ), ξ∗ = (0, . . . , 0, 0, 1) ⟨ v(p), f∗ ∂ ∂θi ⟩ = 0 ⇐⇒ −ηi (p) + ∂ψ ∂θi (p) = 0 ⟨v(p), f(p)⟩ = 0 ⇐⇒ ψ(p) + ϕ(p) − n ∑ i=1 θi(p)ηi (p) = 0 DG (p, q) = ⟨v(q), f(p) − f(q)⟩ = ψ(p) + ϕ(q) − n ∑ i=1 θi(p)ηi (q)

Geometric pre-divergences 34 4 Quasi statistical manifolds and pre-contrast functions
M : a manifold (an open domain in Rn) h : a nondegenerate (0, 2)-tensor filed on M. (h(X, Y ) = 0 for ∀Y ∈ X(M) =⇒ X = 0) ∇ : an affine connection on M Definition 4.1 ∇∗: the quasi dual connection (or left conjugate connection) of ∇ with respect to h def ⇐⇒ Xh(Y, Z) = h(∇∗ X Y, Z) + h(Y, ∇X Z) for X, Y, Z ∈ X(M) We remark that (∇∗)∗ ̸= ∇ in general. Proposition 4.2 If h is symmetric h(X, Y ) = h(Y, X) or skew-symmetric h(X, Y ) = −h(Y, X) =⇒ (∇∗)∗ = ∇ If h is symmetric, then ∇(0) := 1 2 (∇ + ∇∗) =⇒ ∇(0)h = 0

Geometric pre-divergences 35 The torsion tensor T ∇ of ∇:
T ∇(X, Y ) = ∇X Y − ∇Y X − [X, Y ] The curvature tensor R∇ of ∇: R∇(X, Y )Z = ∇X ∇Y Z − ∇Y ∇X Z − ∇[X,Y ] Z Proposition 4.3 (∇X h)(Y, Z) − (∇Y h)(X, Z) = h(T ∇∗ (X, Y ) − T ∇(X, Y ), Z) h(R∇∗ (X, Y )Z, W ) + h(Z, R∇(X, Y )W ) = 0 Deﬁnition 4.4 (M, ∇, h): a quasi statistical manifold def ⇐⇒ (∇X h)(Y, Z) − (∇Y h)(X, Z) = −h(T ∇(X, Y ), Z) In addition, if h is a semi-Riemannian metric, we say that (M, ∇, h) is a statistical manifold admitting torsion (SMAT). Proposition 4.5 (M, ∇, h) : a quasi statistical manifold =⇒ The quasi dual connection ∇∗ of ∇ is torsion free.

Geometric pre-divergences 36 Suppose that (M, ∇, h)
is a quasi statistical manifold. (1) (M, ∇, h) is a Hessian manifold def ⇐⇒ h is symmetric, R∇ = 0 and T ∇ = 0 ⇐⇒ (M, h, ∇, ∇∗) is a dually flat space. (2) (M, ∇, h) is a partially flat space (or space of distant parallelism) def ⇐⇒ R∇ = 0 (R∇∗ = 0, T ∇∗ = 0, but T ∇ ̸= 0, in general.) ⇐⇒ (M, h, ∇, ∇∗) is a partially flat space. Remark 4.6 For a given affine connection, we defined another affine connection. There- fore it is necessary to assume nondegeneracy of (0, 2)-tensor field. If we can define a pair of mutually dual affine connections, we may not assume nondegeneracy of (0, 2)-tensor field.

Geometric pre-divergences 37 Definition 4.7 Two quasi statistical
manifolds (M, ∇, h) and (M, ¯ ∇, ¯ h) are 1-conformally equivalent def ⇐⇒ There exist a function ψ on M such that ¯ h(X, Y ) = eψh(X, Y ), ¯ ∇X Y = ∇X Y − h(X, Y )gradh ψ where gradh ψ is the right gradient vector field defined by h(X, gradh ψ) = dψ(X). Definition 4.8 (M, ∇, h) is 1-conformally partially flat def ⇐⇒ (M, ∇, h) is locally 1-conformally equivalent to some partially flat quasi statistical manifold.

Geometric pre-divergences 38 4.2 Pre-contrast functions M : a manifold
ρ : a function on M × T M ρ[X1 · · · Xi |Y1 · · · Yj Z] : a function on M deﬁned by ρ[X1 · · · Xi |Y1 · · · Yj Z](r) := (X1 )(p) · · ·(Xi )(p) (Y1 )(q) · · · (Yj )(q) ρ(p, Zq )|p=r q=r For example, ρ[ |XY ](r) = X(q) ρ(p, Yq )|p=r q=r ρ[XZ|Y ](r) = X(p) Z(p) ρ(p, Yq )|p=r q=r ρ[XY |ZV ](r) = X(p) Y(p) Z(q) ρ(p, Vq )|p=r q=r . . . Deﬁnition 4.9 For X, X1 , X2 , Y ∈ X(M) and f1 , f2 ∈ C∞(M), ρ : M × T M → R : a pre-contrast function on M def ⇐⇒ (1) ρ(p, (f1 X1 + f2 X2 )q ) = f1 (q)ρ(p, (X1 )q ) + f2 (q)ρ(p, (X2 )q ), (2) ρ[|X] = 0 (i.e. ∀r ∈ M, ρ(r, Xr ) = 0), (3) h(X, Y ) := −ρ[X|Y ] is non-degenerate. D(p, q) : contrast function =⇒ X(q) D(p, q) : pre-contrast function

Geometric pre-divergences 39 Proposition 4.10 We can deﬁne
aﬃne connections ∇ and ∇∗ by h(∇∗ X Y, Z) = −ρ[XY |Z], h(Y, ∇X Z) = −ρ[Y |XZ]. Moreover, ∇, ∇∗ : mutually dual with respect to h. ∇∗ : torsion-free (Proof) Xh(Y, Z) = −Xρ[Y |Z] = −ρ[XY |Z] − ρ[Y |XZ] = h(∇∗ X Y, Z) + h(Y, ∇X Z) h(∇∗ X Y − ∇∗ Y X, Z) = −ρ[XY |Z] + ρ[Y X|Z] = −ρ[[X, Y ]|Z] = h([X, Y ], Z) Lemma 4.11 ρ(p, Xq ) : a pre-contrast function on M =⇒ (M, ∇, h) is a quasi-statistical manifold In particular, if h is a semi-Riemannian metric =⇒ (M, ∇, h) is a statistical manifold admitting torsion.

Geometric pre-divergences 40 5 Affine distributions ω : T M
→ Rn+1: a Rn+1-valued 1-form ξ : M → Rn+1: a Rn+1-valued function Definition 5.1 {ω, ξ} is an affine distribution def ⇐⇒ For an arbitrary point p ∈ M, Rn+1 = Image ωp ⊕ R{ξx } ξ: a transversal vector field {f, ξ}: an affine immersion =⇒ {df, ξ}: an affine distribution Xω(Y ) = ω(∇X Y ) + h(X, Y )ξ, Xξ = −ω(SX) + τ(X)ξ. ∇ : an affine connection (T ∇(X, Y ) ̸= 0 in general) h : a (0, 2)-tensor field (h(X, Y ) ̸= h(Y, X) in general) S : a (1, 1)-tensor field τ : a 1-form

Geometric pre-divergences 41 Xω(Y ) = ω(∇X Y ) +
h(X, Y )ξ, Xξ = −ω(SX) + τ(X)ξ. ω : symmetric def ⇐⇒ h : symmetric ω : nondegenerate def ⇐⇒ h : nondegenerate {ω, ξ} : equiaﬃne def ⇐⇒ τ = 0 Symmetry and nondegeneracy of ω are independent of ξ Proposition 5.2 Set ˜ ξ := ω(V ) + ϕξ. Then the induced objects change as follows: ∇X Y = ˜ ∇X Y + ˜ h(X, Y )V, h(X, Y ) = ϕ˜ h(X, Y ), ˜ SX − ˜ τ(X)V = ϕSX − ∇X V, ϕ˜ τ(X) = h(X, V ) + dϕ(X) + ϕτ(X). Proposition 5.3 Image (dω)p ⊂ Image ωp ⇐⇒ h: symmetric Image (dξ)p ⊂ Image ωp ⇐⇒ τ = 0

Geometric pre-divergences 42 Fundamental structural equations for affine distributions: Gauss
equation: R(X, Y )Z = h(Y, Z)SX − h(X, Z)SY, Codazzi equations: (∇X h)(Y, Z) + h(Y, Z)τ(X) −(∇Y h)(X, Z) + h(X, Z)τ(Y ) = −h(T ∇(X, Y ), Z), (∇X S)(Y ) + τ(Y )SX − (∇Y S)(X) − τ(X)SY = −S(T ∇(X, Y )), Ricci equation: h(X, SY ) − (∇X τ)(Y ) − h(Y, SX) + (∇Y τ)(X) = τ(T ∇(X, Y )). Proposition 5.4 (Haba (2020)) M : simply connected. ∇ : an affine connection, S : a (1, 1)-tensor field h : a (0, 2)-tensor field, τ : a 1-form ∇, h, S and τ satisfy fundamental equations =⇒ ∃{ω, ξ} : an affine distribution which induces ∇, h, S and τ. Theorem 5.5 (Haba (2020)) (1) {ω, ξ} : nondegenerate, equiaffine =⇒ (M, ∇, h) : 1-conformally partially flat quasi statistical manifold. (2) {ω, ξ} : symmetric, nondegenerate, equiaffine =⇒ (M, ∇, h) : 1-conformally partially flat SMAT. If M is simply connected, the converses also hold

Geometric pre-divergences 43 SMAT with the SLD Fisher metric (Kurose
2007) Herm(d) : the set of all Hermitian matrices of degree d. S = {P ∈ Herm(d) | P > 0, traceP = 1} TP S ∼ = A0 A0 = {X ∈ Herm(d) | traceX = 0} We denote by X the corresponding vector field of X. For P ∈ S, X ∈ A0 , define ωP (X) (∈ Herm(d)) and ξ by X = 1 2 (P ωP (X) + ωP (X)P ), ξ = −Id Then {ω, ξ} is an equiaffine distribution. The induced quantities are given by hP (X, Y ) = 1 2 trace ( P (ωP (X)ωP (Y ) + ωP (Y )ωP (X)) ) ( = gP (X, Y ) ) , ( ∇ X Y ) p = hP (X, Y )P − 1 2 (XωP (Y ) + ωP (Y )X). (R = R∗ = 0, T ∗ = 0, but T ̸= 0)

Geometric pre-divergences 44 5.2 Conormal maps and geometric pre-divergence {ω,
ξ} : nondegenerate, equiaffine Rn+1 : the dual space of Rn+1 ⟨ , ⟩ : the canonical pairing of Rn+1 and Rn+1. v : M → Rn+1 is the conormal map of {ω, ξ} def ⇐⇒ ⟨v(p), ξp ⟩ = 1, ⟨v(p), ω(Xp )⟩ = 0 We define a function on M × T M by ρ(q, X) = ⟨v(q), ω(X)⟩. ρ is called the geometric pre-divergence on M. Theorem 1 (M, ∇, h) : a simply connected 1-conformally partially flat quasi statistical manifold =⇒ there exists a pre-contrast function which induces (M, ∇, h).

Geometric pre-divergences 45 6 Generalized projection theorems Theorem 2 {ω,
ξ} : an aﬃne distribution to Rn+1 (M, ∇, h) : a quasi statistical manifold induced from {ω, ξ} with the quasi-dual connection ∇∗ ρ : the geometric quasi-divergence on (M, ∇, h) N ⊂ M : a submanifold in M p ∈ M\N, q ∈ N γ : the ∇∗ geodesic connecting p and q Then γ ⊥ N at q (i.e. h( ˙ γ(0), V ) = 0, ∀V ∈ Tq N) ⇐⇒ ρ(V, γ(t)) = 0 Remark 6.1 h(V, ˙ γ(0)) ̸= 0 in general

Geometric pre-divergences 46 References [1] Haba, K., 1-Conformal geometry of
quasi statistical manifolds, Inf. Geom., (2020). [2] Haba, K. and Matsuzoe H., Complex affine distributions. Differential Geom. Appl., 75 (2021), 101734. [3] Henmi, M. and Matsuzoe, H., Statistical manifolds admitting torsion and partially flat spaces, Geometric structures of information, Signals Commun. Technol., Springer, Cham, 2019, 37–50. [4] Kurose, T., Dual connections and affine geometry, Math. Z., 203 (1990), no. 1, 115–121. [5] Kurose, T., On the divergences of 1-conformally flat statistical manifolds, Tohoku Math. J., 46 (1994), no. 3, 427–433. [6] Kurose, T., Statistical manifolds admitting torsion. Geometry and Something, 2007, in Japanese.

Geometric pre-divergences 47 [7] Blaschke, W., Vorlesungen ¨ uber Differentialgeometrie
II, Affine Differentialgeometrie, Springer, Berlin, 1923. [8] Hotelling, H., Spaces of statistical parameters. Bull. Am. Math. Soc. (AMS), 36(1930), 191. [9] Norden, A. P., ¨ Uber Paare konjugierter Parallel¨ ubertragungen, (On the pairs of adjoint parallel transports, German), Trudy semin. vekt. tenzorn. anal., 4(1937), 205–255. [10] Norden, A. P., On pairs of conjugate parallel translations in n dimensional spaces, C.R. (Doklady) Acad. Sci. URSS (N.S.), 49(1945), 625–628. [11] Rao, R. C., Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc., 37(1945), 81–91. [12] Sen, R. N., On parallelism in Riemannian space, Bull. Calcutta Math. Soc., 36 (1944), 102–107. [13] Sen, R. N., On parallelism in Riemannian space. II. Bull. Calcutta Math. Soc., 37 (1945), 153–159.

Geometric pre-divergences 48 [14] Sen, R. N., On parallelism in
Riemannian space. III, Bull. Calcutta Math. Soc., 38 (1946), 161–167. [15] ˇ Cencov, N. N., A nonsymmetric distance between probability distributions, and entropy and the theorem of Pythagoras. (Russian), Mat. Zametki, 4 (1968), 323–332. [16] ˇ Cencov, N. N., Statistical decision rules and optimal inferences, (Russian) Izdat. ”Nauka”, Moscow, 1972. 520 pp. [17] ˇ Cencov, N. N., Statistical decision rules and optimal inference. Translation from the Russian edited by Lev J. Leifman. Translations of Mathematical Monographs, 53. American Mathematical Society, Providence, R.I., 1982. viii+499 pp. [18] Csisz´ ar, I., I-divergence geometry of probability distributions and minimization problems, Ann. Probability, 3 (1975), 146–158. [19] Nagaoka, H. and Amari, S., Diﬀerential geometry of smooth families of probability distributions, Technical Report METR 82-7, Univer- sity of Tokyo, 1982.

プレ・コントラスト関数の幾何学

プレ・コントラスト関数の幾何学

More Decks by DeepFlow, Inc.

Featured

Transcript