In this lecture we will go through some of the key concepts of linear algebra and inverse problem theory that are required
to develop the theories of the different machine learning algorithm presented in this course.
This is not meant to be an exhaustive treatise and students are strongly advised to take the
ErSE 213 - Inverse Problems
prior to this course.
Three key mathematical objects arise in the study of linear algebra:
Scalars: a∈R, a single number represented by a lower case italic letter;
Vectors: x=[x1,x2,...,xN]T∈RN, ordered collection of N numbers
represented by a lower case bold letter; it is sometimes useful to extract a subset of elements by defining a set
S and add it to as a superscript, xS. As an example, given x=[x1,x2,x3,x4,x5,x6]T∈R6 and S=1,3,5
we can define the vector xS=[x1,x3,x5] and its complementary vector
x−S=[x2,x4,x6]
Matrices: X∈R[N×M], two dimensional collection of numbers represented by an upper case bold letter
where N and M are referred to as the height and width of the matrix. More specifically a matrix can be written as
A matrix can be indexed by rows Xi,: (i-th row), by columns X:,j (j-th column), and by
element Xi,j (i-th row, j-th column). A number of useful operations that are commonly applied on vectors
and matrices are now described:
Transpose: Y=XT, where Yi,j=Xj,i
Matrix plus vector: Y[N×M]=X[N×M]+z[1×M], where
Yi,j=Xi,j+zj (z is added to each row of the matrix X)
Matrix-vector product: y[N×1]=A[N×M]x[M×1], where
yi=∑j=1MAi,jxj
Matrix-vector product: y[N×1]=A[N×M]x[M×1], where
yi=∑j=1MAi,jxj
Matrix-matrix product: C[N×K]=A[N×M]B[M×K], where
Ci,k=∑j=1MAi,jBj,k
Hadamart product (i.e., element-wise product): C[N×M]=A[N×M]⊙B[N×M], where
Ci,j=Ai,jBi,j
Dot product: a=x[N×1]Ty[N×1]=∑i=1Nxiyi
Identity matrix: IN=diag{1N}. Based on its definition, we have that
INx=x and INX=X
Inverse matrix: given y=Ax, the inverse matrix of A is a matrix that
satisfies the following equality A−1A=IN. We can finally write
x=A−1y
Orthogonal vectors and matrices: given two vectors x and y, they are said to be orthogonal if
yTx=0. Given two matrices X and Y, they are said to be orthogonal if
YTX=IN. Orthogonal matrices are especially interesting because their inverse is simply X−1=XT
Matrix decomposition: like any scalar number can be decomposed into a product of prime numbers, a matrix A
can also be decomposed into a combination of vectors (i.e., eigenvectors) and scalars (i.e., eigenvalues).
- Eigendecomposition: real-valued, square, symmetric matrices can be written as
A=VΛVT=∑iλiviviT
where λi and vi are the eigenvalues and eigenvectors of the matrix A, respectively.
Eigenvectors are placed along the columns of the matrix V, which is an orthogonal matrix
(i.e., VT=V−1). Eigenvalues are placed along the diagonal of the matrix
Λ=diag{λ} and tell us about the rank of the matrix, rank(A)=#λ=0. A
full rank matrix is matrix whose eigenvalues are all non-zero and can be inverted. In this case the inverse
of A is A−1=VΛ−1VT
- Singular value decomposition (SVD): this is a more general decomposition which can be applied to real-valued,
non-square, non-symmetric matrices. Singular vectors u and v and singular values λ
generalized the concept of eigenvectors and and eigenvalues. The matrix A can be decomposed as
A=UDVT where D=Λ for square matrices,
D=[Λ0]T for N>M and D=[Λ0] for M>N. Similar
to the eigendecomposition, in this case the inverse of A is A−1=VD−1UT
Conditioning: in general, it refers to how fast a function f(x) changes given a small change in its input x. Similarly
for a matrix, conditioning is linked to the curvature of its associated quadratic form
f(A)=xTAx and it generally indicates how rapidly this function changes as function
of x. It is defined as cond(A)=∣λmin∣∣λmax∣.
Norms: another important object that we will be using when defining cost functions for ML models are norms. A norm is a
function that maps a vector x∈RN to a scalar d∈R and it can be loosely seen as
measure of the length of the vector (i.e., distance from the origin). In general, the Lp norm is defined as:
Euclidean norm (L2): ∣∣x∣∣2=∑ixi2, is a real distance of a vector from the origin of the
N-d Euclidean space. Note that ∣∣x∣∣22=xTx and that ∣∣x∣∣2=1 for a unit vector;
L1 norm: ∣∣x∣∣1=∑i∣xi∣
L0 norm: number of non-zero elements in the vector x