矩阵求导分子布局分母布局 matrix differentiation numerator layout denominator layout-

Matrix Diff erentiation CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng(NUS) Matrix Diff erentiation1 / 34 Linear Fitting Revisited Linear Fitting Revisited Linear fi tting solves this problem: Given n data points pi= xi1 xim, 1 i n, and their corresponding values vi , fi nd a linear function f that minimizes the error E = n X i=1 (f(pi) vi)2.(1) The linear function f(pi) has the form f(p) = f(x1,.,xm) = a1x1+ + amxm+ am+1.(2) Leow Wee Kheng(NUS) Matrix Diff erentiation2 / 34 Linear Fitting Revisited The data points are organized into a matrix equation Da = v,(3) where D = x11x1m1 . . . . . . . . xn1xnm1 , a = a1 . . . am am+1 ,and v = v1 . . . vn . (4) The solution of Eq. 3 is a = (DD)1Dv.(5) Leow Wee Kheng(NUS) Matrix Diff erentiation3 / 34 Linear Fitting Revisited Denote each row of D as d i . Then, E = n X i=1 (d i a vi)2= kDa vk2.(6) So, linear least squares problem can be described very compactly as min a kDa vk2.(7) To show that the solution in Eq. 5 minimizes error E, need to diff erentiate E with respect to a and set it to zero: dE da = 0.(8) How to do this diff erentiation? Leow Wee Kheng(NUS) Matrix Diff erentiation4 / 34 Linear Fitting Revisited The obvious (but hard) way: E = n X i=1 m X j=1 ajxij+ am+1 vi 2 .(9) Expand equation explicitly giving E ak = 2 n X i=1 m X j=1 ajxij+ am+1 vi xik, for k 6= m + 1 2 n X i=1 m X j=1 ajxij+ am+1 vi , for k = m + 1 Then, set E/ak= 0 and solve for ak. This is slow, tedious and error prone! Leow Wee Kheng(NUS) Matrix Diff erentiation5 / 34 Linear Fitting Revisited Which one do you like to be? Leow Wee Kheng(NUS) Matrix Diff erentiation6 / 34 Linear Fitting Revisited At least like these? Leow Wee Kheng(NUS) Matrix Diff erentiation7 / 34 Matrix Derivatives Matrix Derivatives There are 6 common types of matrix derivatives: TypeScalarVectorMatrix Scalar y x y x Y x Vector y x y x Matrix y X Leow Wee Kheng(NUS) Matrix Diff erentiation8 / 34 Matrix Derivatives Derivatives by Scalar Numerator Layout NotationDenominator Layout Notation y x y x y x = y1 x . . . ym x y x = ?y 1 x ym x ? y x Y x = y11 x y1n x . . . . . ym1 x ymn x Leow Wee Kheng(NUS) Matrix Diff erentiation9 / 34 Matrix Derivatives Derivatives by Vector Numerator Layout NotationDenominator Layout Notation y x = ? y x1 y xn ? y x = y x1 . . . y xn y x = y1 x1 y1 xn . . . . . ym x1 ym xn y x = y1 x1 ym x1 . . . . . y1 xn ym xn y x y x Leow Wee Kheng(NUS) Matrix Diff erentiation10 / 34 Matrix Derivatives Derivative by Matrix Numerator Layout NotationDenominator Layout Notation y X = y x11 y xm1 . . . . . y x1n y xmn y X = y x11 y x1n . . . . . y xm1 y xmn y X y X Leow Wee Kheng(NUS) Matrix Diff erentiation11 / 34 Matrix Derivatives Pictorial Representation numerator layout denominator layout . . . . Leow Wee Kheng(NUS) Matrix Diff erentiation12 / 34 Matrix Derivatives Caution Most books and papers dont state which convention they use. Reference 2 uses both conventions but clearly diff erentiate them. y x = ? y x1 y xn ? y x = y x1 . . . y xn y x = y1 x1 y1 xn . . . . . ym x1 ym xn y x = y1 x1 ym x1 . . . . . y1 xn ym xn It is best not to mix the two conventions in your equations. We adopt numerator layout notation. Leow Wee Kheng(NUS) Matrix Diff erentiation13 / 34 Matrix DerivativesCommonly Used Derivatives Commonly Used Derivatives Here, scalar a, vector a and matrix A are not functions of x and x. (C1) da dx = 0(column matrix) (C2) da dx = 0(row matrix) (C3) da dX = 0(matrix) (C4) da dx = 0(matrix) (C5) dx dx = I Leow Wee Kheng(NUS) Matrix Diff erentiation14 / 34 Matrix DerivativesCommonly Used Derivatives (C6) dax dx = dxa dx = a (C7) dxx dx = 2x (C8) d(xa)2 dx = 2xaa (C9) dAx dx = A (C10) dxA dx = A (C11) dxAx dx = x(A + A ) Leow Wee Kheng(NUS) Matrix Diff erentiation15 / 34 Matrix DerivativesDerivatives of Scalar by Scalar Derivatives of Scalar by Scalar (SS1) (u + v) x = u x + v x (SS2) uv x = u v x + v u x (product rule) (SS3) g(u) x = g(u) u u x (chain rule) (SS4) f(g(u) x = f(g) g g(u) u u x (chain rule) Leow Wee Kheng(NUS) Matrix Diff erentiation16 / 34 Matrix DerivativesDerivatives of Vector by Scalar Derivatives of Vector by Scalar (VS1) au x = au x where a is not a function of x. (VS2) Au x = Au x where A is not a function of x. (VS3) u x