data-driven LQR is indirect: t a parametric state-space model is identified from data, d later on controllers are synthesized based on this model n Section II-A. We will briefly review this approach. Regarding the identification task, consider a T-long time es of inputs, disturbances, states, and successor states U0 := ⇥ u(0) u(1) . . . u(T − 1) ⇤ 2 Rm ⇥T , D0 := ⇥ d(0) d(1) . . . d(T − 1) ⇤ 2 Rn⇥T , X 0 := ⇥ x(0) x(1) . . . x(T − 1) ⇤ 2 Rn⇥T , X 1 := ⇥ x(1) x(2) . . . x(T) ⇤ 2 Rn⇥T sfying the dynamics (1), that is, X 1 − D0 = ⇥ B A ⇤ U0 X 0 . (5) s convenient to record the data as consecutive time series, , column i of X 1 coincides with column i + 1 of X 0 , but s is not strictly needed for our developments: the data may ginate from independent experiments. Let for brevity are identification, and certainty-equivalence control The conventional approach to data-driven LQR is indirect: t a parametric state-space model is identified from data, d later on controllers are synthesized based on this model n Section II-A. We will briefly review this approach. Regarding the identification task, consider a T-long time es of inputs, disturbances, states, and successor states U0 := ⇥ u(0) u(1) . . . u(T − 1) ⇤ 2 Rm ⇥T , D0 := ⇥ d(0) d(1) . . . d(T − 1) ⇤ 2 Rn⇥T , X 0 := ⇥ x(0) x(1) . . . x(T − 1) ⇤ 2 Rn⇥T , X 1 := ⇥ x(1) x(2) . . . x(T) ⇤ 2 Rn⇥T sfying the dynamics (1), that is, X 1 − D0 = ⇥ B A ⇤ U0 X 0 . (5) s convenient to record the data as consecutive time series, , column i of X 1 coincides with column i + 1 of X 0 , but s is not strictly needed for our developments: the data may ginate from independent experiments. Let for brevity > > : z(k) = Q 0 0 R1/ 2 x(k) u(k) where k 2 N, x 2 Rn is the state, u 2 Rm is the control input, d isadisturbanceterm, and z istheperformancesignal of interest. We assume that (A, B) is stabilizable. Finally, Q 0 and R 0 are weighting matrices. Here, (⌫ ) and ≺ ( ) denote positive and negative (semi)definiteness. The problem of interest is linear quadratic regulation phrased as designing a state-feedback gain K that renders A + BK Schur and minimizes the H2 -norm of the transfer function T (K ) := d ! z of the closed-loop system1 x(k + 1) z(k) = 2 4 A + BK I Q1/ 2 R1/ 2K 0 3 5 x(k) d(k) , (2) where our notation T (K ) emphasizes the dependence of the transfer function on K . When A + BK is Schur, it holds that kT (K )k2 2 = trace(QP) + trace K > RK P , (3) where P is the controllability Gramian of the closed-loop system (2), which coincides with the unique solution to the > first a parametric state-space model is iden and later on controllers are synthesized bas as in Section II-A. We will briefly review t Regarding the identification task, conside series of inputs, disturbances, states, and su U0 := ⇥ u(0) u(1) . . . u(T − 1) ⇤ 2 D0 := ⇥ d(0) d(1) . . . d(T − 1) ⇤ 2 X 0 := ⇥ x(0) x(1) . . . x(T − 1) ⇤ 2 X 1 := ⇥ x(1) x(2) . . . x(T) ⇤ 2 Rn satisfying the dynamics (1), that is, X 1 − D0 = ⇥ B A ⇤ U0 X 0 It is convenient to record the data as consec i.e., column i of X 1 coincides with column this is not strictly needed for our developme originate from independent experiments. Le W0 := U0 X . > > : z(k) = Q 0 0 R1/ 2 x(k) u(k) where k 2 N, x 2 Rn is the state, u 2 Rm is the control input, d isadisturbanceterm, and z istheperformancesignal of interest. We assume that (A, B) is stabilizable. Finally, Q 0 and R 0 are weighting matrices. Here, (⌫ ) and ≺ ( ) denote positive and negative (semi)definiteness. The problem of interest is linear quadratic regulation phrased as designing a state-feedback gain K that renders A + BK Schur and minimizes the H2 -norm of the transfer function T (K ) := d ! z of the closed-loop system1 x(k + 1) z(k) = 2 4 A + BK I Q1/ 2 R1/ 2K 0 3 5 x(k) d(k) , (2) where our notation T (K ) emphasizes the dependence of the transfer function on K . When A + BK is Schur, it holds that kT (K )k2 2 = trace(QP) + trace K > RK P , (3) where P is the controllability Gramian of the closed-loop system (2), which coincides with the unique solution to the first a parametric state-space model is iden and later on controllers are synthesized base as in Section II-A. We will briefly review t Regarding the identification task, conside series of inputs, disturbances, states, and su U0 := ⇥ u(0) u(1) . . . u(T − 1) ⇤ 2 D0 := ⇥ d(0) d(1) . . . d(T − 1) ⇤ 2 X 0 := ⇥ x(0) x(1) . . . x(T − 1) ⇤ 2 X 1 := ⇥ x(1) x(2) . . . x(T) ⇤ 2 Rn satisfying the dynamics (1), that is, X 1 − D0 = ⇥ B A ⇤ U0 X 0 It is convenient to record the data as consec i.e., column i of X 1 coincides with column this is not strictly needed for our developme originate from independent experiments. Le W0 := U0 . X 1 = AX 0 + BU0 + D0 Indirect & certainty-equivalence LQR • collect I/O data (𝑋0 , 𝑈0 , 𝑋1 ) with 𝐷0 unknown & PE: rank 𝑈0 𝑋0 = 𝑛 + 𝑚 • indirect & certainty- equivalence LQR (optimal in MLE setting) least squares SysID certainty- equivalent LQR