MATH 235: Linear Algebra 2 for Honours Mathematics

Estimated study time: 56 minutes

Table of contents

Chapter 7: Fundamental Subspaces

This chapter reviews key concepts from MATH 136 (subspaces, bases, dimension, linear mappings) and establishes the tools needed for the rest of the course.

7.1 Bases of Fundamental Subspaces

Definition (Fundamental Subspaces). Let \(A\) be an \(m \times n\) matrix. The four fundamental subspaces of \(A\) are:

The columnspace \(\text{Col}(A) = \{A\vec{x} \in \mathbb{R}^m \mid \vec{x} \in \mathbb{R}^n\}\).
The rowspace \(\text{Row}(A) = \{A^T \vec{x} \in \mathbb{R}^n \mid \vec{x} \in \mathbb{R}^m\}\).
The nullspace \(\text{Null}(A) = \{\vec{x} \in \mathbb{R}^n \mid A\vec{x} = \vec{0}\}\).
The left nullspace \(\text{Null}(A^T) = \{\vec{x} \in \mathbb{R}^m \mid A^T\vec{x} = \vec{0}\}\).

Theorem 7.1.1. Let \(A\) be an \(m \times n\) matrix. Then \(\text{Col}(A)\) and \(\text{Null}(A^T)\) are subspaces of \(\mathbb{R}^m\), and \(\text{Row}(A)\) and \(\text{Null}(A)\) are subspaces of \(\mathbb{R}^n\).

Theorem 7.1.2. Let \(A\) be an \(m \times n\) matrix. The columns of \(A\) which correspond to leading ones in the reduced row echelon form of \(A\) form a basis for \(\text{Col}(A)\). Moreover, \(\dim \text{Col}(A) = \text{rank}\, A\).

Theorem 7.1.3. Let \(A\) be an \(m \times n\) matrix. The set of all non-zero rows in the reduced row echelon form of \(A\) form a basis for \(\text{Row}(A)\). Hence, \(\dim \text{Row}(A) = \text{rank}\, A\).

\[\text{rank}\, A + \dim \text{Null}(A) = n\]

\[\dim \text{Null}(A) = n - r \qquad \dim \text{Null}(A^T) = m - r\]

7.2 Subspaces of Linear Mappings

Recall that every linear mapping \(L : \mathbb{R}^n \to \mathbb{R}^m\) has a standard matrix \([L]\) satisfying \(L(\vec{x}) = [L]\vec{x}\). The range and kernel of \(L\) are linked to the fundamental subspaces of \([L]\).

Theorem 7.2.1. If \(L : \mathbb{R}^n \to \mathbb{R}^m\) is a linear mapping, then \(\text{Range}(L) = \text{Col}([L])\) and \(\text{rank}[L] = \dim \text{Range}(L)\).

Theorem 7.2.2. If \(L : \mathbb{R}^n \to \mathbb{R}^m\) is a linear mapping, then \(\text{Ker}(L) = \text{Null}([L])\) and \(\dim(\text{Ker}(L)) = n - \text{rank}[L]\).

\[\dim \text{Range}(L) + \dim \text{Ker}(L) = \dim(\mathbb{R}^n)\]

Chapter 8: Linear Mappings

This chapter generalizes linear mappings from \(\mathbb{R}^n \to \mathbb{R}^m\) to mappings between arbitrary vector spaces, develops the Rank-Nullity Theorem, introduces the matrix of a linear mapping with respect to chosen bases, and studies isomorphisms.

8.1 General Linear Mappings

\[L(s\vec{x} + t\vec{y}) = sL(\vec{x}) + tL(\vec{y})\]

for all \(\vec{x}, \vec{y} \in V\) and \(s, t \in \mathbb{R}\).

Theorem 8.1.1. Let \(V\) and \(W\) be vector spaces and let \(L : V \to W\) be a linear mapping. Then \(L(\vec{0}) = \vec{0}\).

Definition (Addition and Scalar Multiplication of Mappings). Let \(L : V \to W\) and \(M : V \to W\) be linear mappings. We define \(L + M\) by \((L + M)(\vec{v}) = L(\vec{v}) + M(\vec{v})\), and for any \(t \in \mathbb{R}\) we define \(tL\) by \((tL)(\vec{v}) = tL(\vec{v})\).

Theorem 8.1.2. Let \(V\) and \(W\) be vector spaces. The set \(\mathcal{L}\) of all linear mappings \(L : V \to W\) with standard addition and scalar multiplication of mappings is a vector space.

Definition (Composition). Let \(L : V \to W\) and \(M : W \to U\) be linear mappings. Then we define \(M \circ L\) by \((M \circ L)(\vec{v}) = M(L(\vec{v}))\) for all \(\vec{v} \in V\).

8.2 Rank-Nullity Theorem

Definition (Range, Kernel). Let \(L : V \to W\) be a linear mapping. The kernel of \(L\) is \(\text{Ker}(L) = \{\vec{v} \in V \mid L(\vec{v}) = \vec{0}_W\}\) and the range of \(L\) is \(\text{Range}(L) = \{L(\vec{v}) \mid \vec{v} \in V\}\).

Theorem 8.2.1. Let \(L : V \to W\) be a linear mapping. Then \(\text{Ker}(L)\) is a subspace of \(V\) and \(\text{Range}(L)\) is a subspace of \(W\).

Definition (Rank, Nullity). Let \(L : V \to W\) be a linear mapping. The rank of \(L\) is \(\text{rank}(L) = \dim(\text{Range}(L))\). The nullity of \(L\) is \(\text{nullity}(L) = \dim(\text{Ker}(L))\).

\[\text{rank}(L) + \text{nullity}(L) = n\]

The proof extends the basis of \(\text{Ker}(L)\) to a basis of \(V\) and shows the images of the extra basis vectors form a basis for \(\text{Range}(L)\), exactly mirroring the proof of the Dimension Theorem.

8.3 Matrix of a Linear Mapping

Using coordinate vectors, every linear mapping between finite-dimensional vector spaces can be represented as a matrix mapping. If \(B\) is a basis for \(V\) and \(C\) is a basis for \(W\), the matrix satisfies \([L(\vec{x})]_C = {}_C[L]_B [\vec{x}]_B\).

\[{}_C[L]_B = \begin{bmatrix} [L(\vec{v}_1)]_C & \cdots & [L(\vec{v}_n)]_C \end{bmatrix}\]

It satisfies \([L(\vec{x})]_C = {}_C[L]_B [\vec{x}]_B\) for all \(\vec{x} \in V\).

\[[L]_B = \begin{bmatrix} [L(\vec{v}_1)]_B & \cdots & [L(\vec{v}_n)]_B \end{bmatrix}\]

It satisfies \([L(\vec{x})]_B = [L]_B [\vec{x}]_B\) for all \(\vec{x} \in V\).

Remark. When the basis \(B\) consists of eigenvectors of \(L\), the matrix \([L]_B\) is diagonal. This connection between diagonalization and the matrix of a linear operator is central to the course.

8.4 Isomorphisms

Definition (One-to-One, Onto). Let \(L : V \to W\) be a linear mapping. \(L\) is called one-to-one (injective) if \(L(\vec{u}) = L(\vec{v})\) implies \(\vec{u} = \vec{v}\). \(L\) is called onto (surjective) if for every \(\vec{w} \in W\), there exists \(\vec{v} \in V\) such that \(L(\vec{v}) = \vec{w}\).

Lemma 8.4.1. Let \(L : V \to W\) be a linear mapping. \(L\) is one-to-one if and only if \(\text{Ker}(L) = \{\vec{0}\}\).

Definition (Isomorphism). Let \(V\) and \(W\) be vector spaces. Then \(V\) is said to be isomorphic to \(W\) if there exists a linear mapping \(L : V \to W\) which is one-to-one and onto. \(L\) is called an isomorphism.

Theorem 8.4.2. Let \(V\) and \(W\) be finite-dimensional vector spaces. Then \(V\) is isomorphic to \(W\) if and only if \(\dim V = \dim W\).

Theorem 8.4.3. If \(V\) and \(W\) are isomorphic vector spaces and \(L : V \to W\) is linear, then \(L\) is one-to-one if and only if \(L\) is onto.

Theorem 8.4.4. Let \(V\) and \(W\) be isomorphic vector spaces and let \(\{\vec{v}_1, \ldots, \vec{v}_n\}\) be a basis for \(V\). Then a linear mapping \(L : V \to W\) is an isomorphism if and only if \(\{L(\vec{v}_1), \ldots, L(\vec{v}_n)\}\) is a basis for \(W\).

Chapter 9: Inner Products

This chapter extends the dot product to general inner products on vector spaces, develops orthogonality, the Gram-Schmidt procedure, projections, the Fundamental Theorem of Linear Algebra, and least squares.

9.1 Inner Product Spaces

Definition (Inner Product). Let \(V\) be a vector space. An inner product on \(V\) is a function \(\langle \cdot, \cdot \rangle : V \times V \to \mathbb{R}\) satisfying for every \(\vec{v}, \vec{u}, \vec{w} \in V\) and \(s, t \in \mathbb{R}\):

I1 (Positive Definite): \(\langle \vec{v}, \vec{v} \rangle \geq 0\) and \(\langle \vec{v}, \vec{v} \rangle = 0\) if and only if \(\vec{v} = \vec{0}\).
I2 (Symmetric): \(\langle \vec{v}, \vec{w} \rangle = \langle \vec{w}, \vec{v} \rangle\).
I3 (Bilinear): \(\langle s\vec{v} + t\vec{u}, \vec{w} \rangle = s\langle \vec{v}, \vec{w} \rangle + t\langle \vec{u}, \vec{w} \rangle\).

Definition (Inner Product Space). A vector space \(V\) with an inner product \(\langle \cdot, \cdot \rangle\) on \(V\) is called an inner product space.

Important examples include the standard dot product on \(\mathbb{R}^n\), the trace inner product \(\langle A, B \rangle = \text{tr}(B^T A)\) on \(M_{m \times n}(\mathbb{R})\), evaluation inner products such as \(\langle p, q \rangle = p(-1)q(-1) + p(0)q(0) + p(1)q(1)\) on \(P_2(\mathbb{R})\), and the integral inner product \(\langle f, g \rangle = \int_{-\pi}^{\pi} f(x)g(x)\,dx\) on \(C[-\pi, \pi]\).

Theorem 9.1.1. Let \(V\) be an inner product space. Then for any \(\vec{v} \in V\), \(\langle \vec{v}, \vec{0} \rangle = 0\).

9.2 Orthogonality and Length

Definition (Length). Let \(V\) be an inner product space. For any \(\vec{v} \in V\), the length (or norm) of \(\vec{v}\) is \(\|\vec{v}\| = \sqrt{\langle \vec{v}, \vec{v} \rangle}\).

Definition (Unit Vector). If \(\vec{v} \in V\) satisfies \(\|\vec{v}\| = 1\), then \(\vec{v}\) is called a unit vector.

Theorem 9.2.1. Let \(V\) be an inner product space. For any \(\vec{v} \in V\) and \(t \in \mathbb{R}\), we have \(\|t\vec{v}\| = |t|\|\vec{v}\|\). Moreover, if \(\vec{v} \neq \vec{0}\), then \(\hat{v} = \frac{1}{\|\vec{v}\|}\vec{v}\) is a unit vector.

Definition (Orthogonal Vectors). Vectors \(\vec{x}, \vec{y}\) in an inner product space are orthogonal if \(\langle \vec{x}, \vec{y} \rangle = 0\).

Definition (Orthogonal Set). A set \(S = \{\vec{v}_1, \ldots, \vec{v}_k\}\) is orthogonal if \(\langle \vec{v}_i, \vec{v}_j \rangle = 0\) for all \(i \neq j\).

Theorem 9.2.2 (Generalized Pythagorean Theorem). If \(\{\vec{v}_1, \ldots, \vec{v}_k\}\) is an orthogonal set, then \(\|\vec{v}_1 + \cdots + \vec{v}_k\|^2 = \|\vec{v}_1\|^2 + \cdots + \|\vec{v}_k\|^2\).

Theorem 9.2.3. Let \(S = \{\vec{v}_1, \ldots, \vec{v}_k\}\) be an orthogonal set in an inner product space with \(\vec{v}_i \neq \vec{0}\) for all \(i\). Then \(S\) is linearly independent.

\[\vec{v} = \frac{\langle \vec{v}, \vec{v}_1 \rangle}{\|\vec{v}_1\|^2}\vec{v}_1 + \cdots + \frac{\langle \vec{v}, \vec{v}_n \rangle}{\|\vec{v}_n\|^2}\vec{v}_n\]

Definition (Orthonormal Set, Orthonormal Basis). An orthogonal set \(S = \{\vec{v}_1, \ldots, \vec{v}_k\}\) with \(\|\vec{v}_i\| = 1\) for all \(i\) is called orthonormal. An orthonormal set that is a basis for \(V\) is an orthonormal basis.

\[\vec{v} = \langle \vec{v}, \vec{v}_1 \rangle \vec{v}_1 + \cdots + \langle \vec{v}, \vec{v}_n \rangle \vec{v}_n\]

Theorem 9.2.6. For an \(n \times n\) matrix \(P\), the following are equivalent:

The columns of \(P\) form an orthonormal basis for \(\mathbb{R}^n\).
\(P^T = P^{-1}\).
The rows of \(P\) form an orthonormal basis for \(\mathbb{R}^n\).

Definition (Orthogonal Matrix). An \(n \times n\) matrix \(P\) whose columns form an orthonormal basis for \(\mathbb{R}^n\) is called an orthogonal matrix.

Theorem 9.2.7. Let \(P\) and \(Q\) be \(n \times n\) orthogonal matrices and \(\vec{x}, \vec{y} \in \mathbb{R}^n\). Then:

\((P\vec{x}) \cdot (P\vec{y}) = \vec{x} \cdot \vec{y}\).
\(\|P\vec{x}\| = \|\vec{x}\|\).
\(\det P = \pm 1\).
All real eigenvalues of \(P\) are \(\pm 1\).
\(PQ\) is orthogonal.

9.3 The Gram-Schmidt Procedure

Given any basis for a subspace of an inner product space, the Gram-Schmidt procedure converts it into an orthogonal basis. Each new vector is obtained by subtracting the projection onto the span of the previously computed vectors.

\[\vec{v}_1 = \vec{w}_1\]\[\vec{v}_2 = \vec{w}_2 - \frac{\langle \vec{w}_2, \vec{v}_1 \rangle}{\|\vec{v}_1\|^2}\vec{v}_1\]\[\vec{v}_i = \vec{w}_i - \frac{\langle \vec{w}_i, \vec{v}_1 \rangle}{\|\vec{v}_1\|^2}\vec{v}_1 - \frac{\langle \vec{w}_i, \vec{v}_2 \rangle}{\|\vec{v}_2\|^2}\vec{v}_2 - \cdots - \frac{\langle \vec{w}_i, \vec{v}_{i-1} \rangle}{\|\vec{v}_{i-1}\|^2}\vec{v}_{i-1}\]

for \(3 \leq i \leq k\). Then \(\text{Span}\{\vec{v}_1, \ldots, \vec{v}_i\} = \text{Span}\{\vec{w}_1, \ldots, \vec{w}_i\}\) for \(1 \leq i \leq k\). In particular, \(\{\vec{v}_1, \ldots, \vec{v}_k\}\) is an orthogonal basis for \(W\).

If a vector \(\vec{v}_i\) turns out to be \(\vec{0}\), this indicates that \(\vec{w}_i\) was a linear combination of the previous vectors; we simply skip it and continue.

9.4 General Projections

\[W^\perp = \{\vec{v} \in V \mid \langle \vec{w}, \vec{v} \rangle = 0 \text{ for all } \vec{w} \in W\}\]

Theorem 9.4.1. Let \(\{\vec{v}_1, \ldots, \vec{v}_k\}\) be a basis for a subspace \(W\) of an inner product space \(V\). If \(\langle \vec{x}, \vec{v}_i \rangle = 0\) for \(1 \leq i \leq k\), then \(\vec{x} \in W^\perp\).

Theorem 9.4.2. Let \(W\) be a finite-dimensional subspace of an inner product space \(V\). Then:

\(W^\perp\) is a subspace of \(V\).
If \(\dim V = n\), then \(\dim W^\perp = n - \dim W\).
If \(\dim V = n\), then \((W^\perp)^\perp = W\).
\(W \cap W^\perp = \{\vec{0}\}\).
If \(\dim V = n\), \(\{\vec{v}_1, \ldots, \vec{v}_k\}\) is an orthogonal basis for \(W\), and \(\{\vec{v}_{k+1}, \ldots, \vec{v}_n\}\) is an orthogonal basis for \(W^\perp\), then \(\{\vec{v}_1, \ldots, \vec{v}_n\}\) is an orthogonal basis for \(V\).

\[\text{proj}_W \vec{v} = \frac{\langle \vec{v}, \vec{v}_1 \rangle}{\|\vec{v}_1\|^2}\vec{v}_1 + \cdots + \frac{\langle \vec{v}, \vec{v}_k \rangle}{\|\vec{v}_k\|^2}\vec{v}_k\]\[\text{perp}_W \vec{v} = \vec{v} - \text{proj}_W \vec{v}\]

Theorem 9.4.3. For any \(\vec{v} \in V\), \(\text{perp}_W \vec{v} = \vec{v} - \text{proj}_W \vec{v} \in W^\perp\).

Theorem 9.4.4. The mapping \(\text{proj}_W\) is a linear operator on \(V\) with kernel \(W^\perp\).

Theorem 9.4.5. For any \(\vec{v} \in V\), \(\text{proj}_{W^\perp} \vec{v} = \text{perp}_W \vec{v}\).

9.5 The Fundamental Theorem

Definition (Direct Sum). Let \(U\) and \(W\) be subspaces of \(V\) with \(U \cap W = \{\vec{0}\}\). The direct sum is \(U \oplus W = \{\vec{u} + \vec{w} \in V \mid \vec{u} \in U, \vec{w} \in W\}\).

Theorem 9.5.1. If \(U \cap W = \{\vec{0}\}\), then \(U \oplus W\) is a subspace of \(V\). Moreover, if \(\{\vec{v}_1, \ldots, \vec{v}_k\}\) is a basis for \(U\) and \(\{\vec{w}_1, \ldots, \vec{w}_\ell\}\) is a basis for \(W\), then \(\{\vec{v}_1, \ldots, \vec{v}_k, \vec{w}_1, \ldots, \vec{w}_\ell\}\) is a basis for \(U \oplus W\).

Theorem 9.5.2. Let \(V\) be a finite-dimensional inner product space and \(W\) a subspace of \(V\). Then \(W \oplus W^\perp = V\).

\[\mathbb{R}^n = \text{Row}(A) \oplus \text{Null}(A) \qquad \mathbb{R}^m = \text{Col}(A) \oplus \text{Null}(A^T)\]

This theorem implies the Rank-Nullity Theorem and many results about systems of linear equations.

9.6 The Method of Least Squares

When \(A\vec{x} = \vec{b}\) is inconsistent, we seek \(\vec{x}\) minimizing \(\|A\vec{x} - \vec{b}\|\). The Approximation Theorem tells us the closest vector in \(\text{Col}(A)\) to \(\vec{b}\) is \(\text{proj}_{\text{Col}(A)} \vec{b}\), leading to the normal equations.

Theorem 9.6.1 (Approximation Theorem). Let \(W\) be a finite-dimensional subspace of an inner product space \(V\). If \(\vec{v} \in V\), then the vector in \(W\) closest to \(\vec{v}\) is \(\text{proj}_W \vec{v}\). That is, \(\|\vec{v} - \text{proj}_W \vec{v}\| < \|\vec{v} - \vec{w}\|\) for all \(\vec{w} \in W\), \(\vec{w} \neq \text{proj}_W \vec{v}\).

By the Fundamental Theorem, \(\vec{b} - A\vec{x} \in \text{Null}(A^T)\), giving the normal system \(A^T A \vec{x} = A^T \vec{b}\).

\[X = \begin{bmatrix} 1 & x_1 & \cdots & x_1^n \\ \vdots & \vdots & & \vdots \\ 1 & x_m & \cdots & x_m^n \end{bmatrix}, \quad \vec{y} = \begin{bmatrix} y_1 \\ \vdots \\ y_m \end{bmatrix}\]

If \(\vec{a}\) solves the normal system \(X^T X \vec{a} = X^T \vec{y}\), then \(p(x) = a_0 + a_1 x + \cdots + a_n x^n\) is the best fitting polynomial of degree \(n\). Moreover, if at least \(n+1\) of the \(x_i\) are distinct, then \(X^T X\) is invertible and \(\vec{a} = (X^T X)^{-1} X^T \vec{y}\).

Chapter 10: Applications of Orthogonal Matrices

10.1 Orthogonal Similarity

Definition (Orthogonally Similar). Two matrices \(A\) and \(B\) are orthogonally similar if there exists an orthogonal matrix \(P\) such that \(P^T AP = B\).

Theorem 10.1.1 (Triangularization Theorem). If \(A\) is an \(n \times n\) matrix with real eigenvalues, then \(A\) is orthogonally similar to an upper triangular matrix \(T\).

The proof proceeds by induction on \(n\): find a unit eigenvector, extend to an orthonormal basis via Gram-Schmidt, form the orthogonal matrix \(P_1\), and reduce the problem to an \((n-1) \times (n-1)\) case.

10.2 Orthogonal Diagonalization

Definition (Orthogonally Diagonalizable). An \(n \times n\) matrix \(A\) is orthogonally diagonalizable if there exists an orthogonal matrix \(P\) and a diagonal matrix \(D\) such that \(P^T AP = D\).

Theorem 10.2.1. If \(A\) is orthogonally diagonalizable, then \(A^T = A\) (i.e., \(A\) is symmetric).

Definition (Symmetric Matrix). A matrix \(A\) such that \(A^T = A\) is called symmetric.

Lemma 10.2.2. If \(A\) is a symmetric matrix with real entries, then all of its eigenvalues are real.

The proof of this lemma requires complex inner products and is given in Chapter 11.

Theorem 10.2.3 (Principal Axis Theorem). Every symmetric matrix is orthogonally diagonalizable.

The proof uses the Triangularization Theorem: since \(A\) is symmetric and all eigenvalues are real, we get \(P^T AP = T\) upper triangular, but then \(T^T = T\) forces \(T\) to be diagonal.

Theorem 10.2.4. A matrix \(A\) is symmetric if and only if \(\vec{x} \cdot (A\vec{y}) = (A\vec{x}) \cdot \vec{y}\) for all \(\vec{x}, \vec{y} \in \mathbb{R}^n\).

Theorem 10.2.5. If \(\vec{v}_1, \vec{v}_2\) are eigenvectors of a symmetric matrix \(A\) corresponding to distinct eigenvalues, then \(\vec{v}_1\) is orthogonal to \(\vec{v}_2\).

To orthogonally diagonalize a symmetric matrix: find all eigenvalues, find a basis for each eigenspace, apply Gram-Schmidt within each eigenspace (if needed), and normalize to form the orthogonal matrix \(P\).

10.3 Quadratic Forms

Definition (Quadratic Form). A quadratic form on \(\mathbb{R}^n\) with corresponding symmetric matrix \(A\) is defined by \(Q(\vec{x}) = \vec{x}^T A \vec{x}\) for any \(\vec{x} \in \mathbb{R}^n\).

Every quadratic form can be represented by a unique symmetric matrix. The off-diagonal entry \(a_{ij}\) is half the coefficient of the cross-term \(x_i x_j\).

Definition (Classification of Quadratic Forms). A quadratic form \(Q(\vec{x})\) is:

Positive definite if \(Q(\vec{x}) > 0\) for all \(\vec{x} \neq \vec{0}\).
Negative definite if \(Q(\vec{x}) < 0\) for all \(\vec{x} \neq \vec{0}\).
Indefinite if \(Q(\vec{x}) > 0\) for some \(\vec{x}\) and \(Q(\vec{x}) < 0\) for some \(\vec{x}\).
Positive semidefinite if \(Q(\vec{x}) \geq 0\) for all \(\vec{x}\) and \(Q(\vec{x}) = 0\) for some \(\vec{x} \neq \vec{0}\).
Negative semidefinite if \(Q(\vec{x}) \leq 0\) for all \(\vec{x}\) and \(Q(\vec{x}) = 0\) for some \(\vec{x} \neq \vec{0}\).

Theorem 10.3.1. Let \(A\) be the symmetric matrix of \(Q(\vec{x}) = \vec{x}^T A \vec{x}\). If \(P\) orthogonally diagonalizes \(A\), then \(Q(\vec{x}) = \lambda_1 y_1^2 + \cdots + \lambda_n y_n^2\) where \(\vec{y} = P^T \vec{x}\) and \(\lambda_1, \ldots, \lambda_n\) are the eigenvalues of \(A\).

Theorem 10.3.2 (Eigenvalue Test for Definiteness). Let \(A\) be a symmetric matrix. Then \(Q(\vec{x}) = \vec{x}^T A \vec{x}\) is:

Positive definite iff all eigenvalues of \(A\) are positive.
Negative definite iff all eigenvalues are negative.
Indefinite iff some eigenvalues are positive and some are negative.
Positive semidefinite iff eigenvalues are non-negative with at least one zero.
Negative semidefinite iff eigenvalues are non-positive with at least one zero.

10.4 Graphing Quadratic Forms

Theorem 10.4.1. Let \(Q(\vec{x}) = ax_1^2 + bx_1 x_2 + cx_2^2\). There exists an orthogonal matrix \(P\) corresponding to a rotation such that the change of variables \(\vec{y} = P^T \vec{x}\) brings \(Q\) into diagonal form \(\lambda_1 y_1^2 + \lambda_2 y_2^2\).

The eigenvectors of the corresponding symmetric matrix are called the principal axes of the quadratic form. The level curve \(Q(\vec{x}) = k\) can be an ellipse, hyperbola, pair of lines, or a point depending on the signs of \(\lambda_1, \lambda_2\) and \(k\).

10.5 Optimizing Quadratic Forms

Theorem 10.5.1. Let \(Q(\vec{x})\) be a quadratic form with corresponding symmetric matrix \(A\). The maximum value of \(Q(\vec{x})\) subject to \(\|\vec{x}\| = 1\) is the greatest eigenvalue of \(A\), and the minimum is the least eigenvalue. These values are attained at the corresponding unit eigenvectors.

10.6 Singular Value Decomposition

The SVD extends diagonalization-like ideas to arbitrary \(m \times n\) matrices. We maximize \(\|A\vec{x}\|\) subject to \(\|\vec{x}\| = 1\) by examining the eigenvalues of \(A^T A\).

Theorem 10.6.1. Let \(A\) be an \(m \times n\) matrix and let \(\lambda_1, \ldots, \lambda_n\) be the eigenvalues of \(A^T A\) with corresponding unit eigenvectors \(\vec{v}_1, \ldots, \vec{v}_n\). Then \(\lambda_1, \ldots, \lambda_n\) are all non-negative, and \(\|A\vec{v}_i\| = \sqrt{\lambda_i}\).

Definition (Singular Values). The singular values \(\sigma_1, \ldots, \sigma_n\) of an \(m \times n\) matrix \(A\) are the square roots of the eigenvalues of \(A^T A\) arranged so that \(\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_n \geq 0\).

Theorem 10.6.2. Let \(A\) be an \(m \times n\) matrix. Then \(\text{rank}(A^T A) = \text{rank}(A)\).

Corollary 10.6.3. If \(A\) is \(m \times n\) with \(\text{rank}(A) = r\), then \(A\) has exactly \(r\) non-zero singular values.

Definition (Singular Vectors). Let \(A\) be \(m \times n\). If \(\vec{v} \in \mathbb{R}^n\) and \(\vec{u} \in \mathbb{R}^m\) are unit vectors and \(\sigma \neq 0\) is a singular value with \(A\vec{v} = \sigma \vec{u}\) and \(A^T \vec{u} = \sigma \vec{v}\), then \(\vec{u}\) is a left singular vector and \(\vec{v}\) is a right singular vector. Additionally, unit vectors in \(\text{Null}(A)\) are right singular vectors and unit vectors in \(\text{Null}(A^T)\) are left singular vectors.

Theorem 10.6.4. If \(\vec{v}\) is a right singular vector of \(A\), then \(\vec{v}\) is an eigenvector of \(A^T A\). If \(\vec{u}\) is a left singular vector of \(A\), then \(\vec{u}\) is an eigenvector of \(AA^T\).

Lemma 10.6.5. Let \(A\) be \(m \times n\) with rank \(r\), and let \(\{\vec{v}_1, \ldots, \vec{v}_n\}\) be an orthonormal basis of eigenvectors of \(A^T A\) with eigenvalues \(\lambda_1 \geq \cdots \geq \lambda_n\). Then \(\left\{\frac{A\vec{v}_1}{\sigma_1}, \ldots, \frac{A\vec{v}_r}{\sigma_r}\right\}\) is an orthonormal basis for \(\text{Col}(A)\).

Theorem 10.6.6. Let \(A\) be \(m \times n\) with rank \(r\). There exists an orthonormal basis \(\{\vec{v}_1, \ldots, \vec{v}_n\}\) of \(\mathbb{R}^n\) of right singular vectors and an orthonormal basis \(\{\vec{u}_1, \ldots, \vec{u}_m\}\) of \(\mathbb{R}^m\) of left singular vectors.

Definition (Singular Value Decomposition). A singular value decomposition of an \(m \times n\) matrix \(A\) is a factorization \(A = U \Sigma V^T\), where \(U\) is an \(m \times m\) orthogonal matrix of left singular vectors, \(V\) is an \(n \times n\) orthogonal matrix of right singular vectors, and \(\Sigma\) is the \(m \times n\) matrix with \((\Sigma)_{ii} = \sigma_i\) for \(1 \leq i \leq r\) and all other entries zero.

The algorithm: (1) find eigenvalues and orthonormal eigenvectors of \(A^T A\); (2) form \(V\) and \(\Sigma\); (3) compute \(\vec{u}_i = \frac{1}{\sigma_i} A \vec{v}_i\) for \(1 \leq i \leq r\) and extend to an orthonormal basis for \(\mathbb{R}^m\) to get \(U\).

Chapter 11: Complex Vector Spaces

This chapter extends all results to complex vector spaces: complex diagonalization, Hermitian inner products, unitary matrices, the Spectral Theorem for normal matrices, and the Cayley-Hamilton Theorem.

11.1 Complex Number Review

Definition (Absolute Value). The absolute value (modulus) of \(z = a + bi\) is \(|z| = \sqrt{a^2 + b^2}\).

Theorem 11.1.1. Let \(w, z \in \mathbb{C}\). Then: (1) \(|z|\) is a non-negative real number; (2) \(|z| = 0\) iff \(z = 0\); (3) \(|w + z| \leq |w| + |z|\).

Theorem 11.1.2. For \(z_1, z_2, z_3 \in \mathbb{C}\): (1) commutativity of addition and multiplication; (2) associativity; (3) distributivity; (4) \(|z_1 z_2| = |z_1||z_2|\).

Definition (Complex Conjugate). If \(z = a + bi\), then the complex conjugate is \(\bar{z} = a - bi\).

Theorem 11.1.3. For \(w, z \in \mathbb{C}\) with \(z = a + bi\): (1) \(\overline{\bar{z}} = z\); (2) \(z\) is real iff \(\bar{z} = z\); (3) \(z\) is imaginary iff \(\bar{z} = -z\); (4) \(\overline{z \pm w} = \bar{z} \pm \bar{w}\); (5) \(\overline{zw} = \bar{z}\bar{w}\); (6) \(z + \bar{z} = 2a\); (7) \(z - \bar{z} = 2bi\); (8) \(z\bar{z} = a^2 + b^2 = |z|^2\).

11.2 Complex Vector Spaces

Definition (Complex Vector Space). A set \(V\) is a vector space over \(\mathbb{C}\) if it satisfies all ten vector space axioms (V1–V10), with scalars from \(\mathbb{C}\) instead of \(\mathbb{R}\).

Theorem 11.2.1. If \(\vec{z} \in \mathbb{C}^n\), then there exist \(\vec{x}, \vec{y} \in \mathbb{R}^n\) such that \(\vec{z} = \vec{x} + i\vec{y}\).

Definition (Subspace over \(\mathbb{C}\)). If \(S\) is a subset of a complex vector space \(V\) and \(S\) is itself a complex vector space under the same operations, then \(S\) is a subspace of \(V\).

All concepts (linear independence, spanning, bases, dimension, coordinates, linear mappings, matrices of linear mappings, determinants, invertibility) carry over from the real case, with complex scalars.

Definition (Complex Conjugate of Vectors and Matrices). For \(\vec{z} \in \mathbb{C}^n\), \(\bar{\vec{z}}\) is obtained by conjugating each entry. For \(A \in M_{m \times n}(\mathbb{C})\), \(\bar{A}\) is obtained by conjugating each entry.

Theorem 11.2.2. For \(A \in M_{m \times n}(\mathbb{C})\) and \(\vec{z} \in \mathbb{C}^n\), \(\overline{A\vec{z}} = \bar{A}\bar{\vec{z}}\).

11.3 Complex Diagonalization

Definition (Eigenvalue, Eigenvector over \(\mathbb{C}\)). Let \(A \in M_{n \times n}(\mathbb{C})\). If there exists \(\lambda \in \mathbb{C}\) and \(\vec{z} \in \mathbb{C}^n\), \(\vec{z} \neq \vec{0}\), such that \(A\vec{z} = \lambda \vec{z}\), then \(\lambda\) is an eigenvalue and \(\vec{z}\) is an eigenvector.

All theory of similar matrices, eigenvalues, eigenvectors, and diagonalization from MATH 136 applies with complex eigenvalues and eigenvectors allowed.

Theorem 11.3.1. Let \(A \in M_{n \times n}(\mathbb{R})\). If \(\lambda\) is a non-real eigenvalue of \(A\) with eigenvector \(\vec{z}\), then \(\bar{\lambda}\) is also an eigenvalue of \(A\) with eigenvector \(\bar{\vec{z}}\).

Corollary 11.3.2. If \(A \in M_{n \times n}(\mathbb{R})\) with \(n\) odd, then \(A\) has at least one real eigenvalue.

11.4 Complex Inner Products

The standard dot product on \(\mathbb{C}^n\) does not yield a non-negative real number for \(\vec{z} \cdot \vec{z}\). Instead, we conjugate one argument.

\[\langle \vec{z}, \vec{w} \rangle = \vec{z} \cdot \bar{\vec{w}} = z_1 \bar{w}_1 + \cdots + z_n \bar{w}_n\]

Theorem 11.4.1. The standard inner product on \(\mathbb{C}^n\) satisfies: (1) \(\langle \vec{z}, \vec{z} \rangle \in \mathbb{R}\) and \(\langle \vec{z}, \vec{z} \rangle \geq 0\), with equality iff \(\vec{z} = \vec{0}\); (2) \(\langle \vec{z}, \vec{w} \rangle = \overline{\langle \vec{w}, \vec{z} \rangle}\) (conjugate symmetric); (3) \(\langle \vec{v} + \vec{z}, \vec{w} \rangle = \langle \vec{v}, \vec{w} \rangle + \langle \vec{z}, \vec{w} \rangle\); (4) \(\langle \alpha \vec{z}, \vec{w} \rangle = \alpha \langle \vec{z}, \vec{w} \rangle\).

Definition (Hermitian Inner Product). Let \(V\) be a complex vector space. A Hermitian inner product on \(V\) is a function \(\langle \cdot, \cdot \rangle : V \times V \to \mathbb{C}\) satisfying for all \(\vec{v}, \vec{w}, \vec{z} \in V\) and \(\alpha \in \mathbb{C}\): (1) \(\langle \vec{z}, \vec{z} \rangle \in \mathbb{R}\), \(\langle \vec{z}, \vec{z} \rangle \geq 0\), with equality iff \(\vec{z} = \vec{0}\); (2) \(\langle \vec{z}, \vec{w} \rangle = \overline{\langle \vec{w}, \vec{z} \rangle}\); (3) \(\langle \vec{v} + \vec{z}, \vec{w} \rangle = \langle \vec{v}, \vec{w} \rangle + \langle \vec{z}, \vec{w} \rangle\); (4) \(\langle \alpha \vec{z}, \vec{w} \rangle = \alpha \langle \vec{z}, \vec{w} \rangle\).

Theorem 11.4.2. If \(\langle \cdot, \cdot \rangle\) is a Hermitian inner product, then \(\langle \vec{z}, \vec{v} + \vec{w} \rangle = \langle \vec{z}, \vec{v} \rangle + \langle \vec{z}, \vec{w} \rangle\) and \(\langle \vec{z}, \alpha \vec{w} \rangle = \bar{\alpha} \langle \vec{z}, \vec{w} \rangle\).

Theorem 11.4.3. The function \(\langle A, B \rangle = \text{tr}(\bar{B}^T A)\) defines a Hermitian inner product on \(M_{m \times n}(\mathbb{C})\).

Definition (Length, Unit Vector in \(\mathbb{C}^n\)). \(\|\vec{z}\| = \sqrt{\langle \vec{z}, \vec{z} \rangle}\). A vector with \(\|\vec{z}\| = 1\) is a unit vector.

Definition (Orthogonality in \(\mathbb{C}^n\)). Vectors \(\vec{z}, \vec{w}\) are orthogonal if \(\langle \vec{z}, \vec{w} \rangle = 0\).

Theorem 11.4.4. For any \(\vec{z}, \vec{w}\) in a Hermitian inner product space and \(\alpha \in \mathbb{C}\): (1) \(\|\alpha \vec{z}\| = |\alpha| \|\vec{z}\|\); (2) \(\|\vec{z} + \vec{w}\| \leq \|\vec{z}\| + \|\vec{w}\|\); (3) \(\frac{1}{\|\vec{z}\|}\vec{z}\) is a unit vector.

Definition (Orthogonal, Orthonormal Sets in \(\mathbb{C}^n\)). A set \(S = \{\vec{z}_1, \ldots, \vec{z}_k\}\) is orthogonal if \(\langle \vec{z}_\ell, \vec{z}_j \rangle = 0\) for all \(\ell \neq j\). It is orthonormal if additionally \(\|\vec{z}_j\| = 1\) for all \(j\).

Theorem 11.4.5 (Pythagorean Theorem). If \(\{\vec{z}_1, \ldots, \vec{z}_k\}\) is orthogonal, then \(\|\vec{z}_1 + \cdots + \vec{z}_k\|^2 = \|\vec{z}_1\|^2 + \cdots + \|\vec{z}_k\|^2\).

Theorem 11.4.6. An orthogonal set of non-zero vectors in a Hermitian inner product space is linearly independent.

The Gram-Schmidt procedure works for Hermitian inner products as well.

Definition (Unitary Matrix). A matrix \(U \in M_{n \times n}(\mathbb{C})\) whose columns form an orthonormal basis for \(\mathbb{C}^n\) is called unitary.

Theorem 11.4.7. For \(U \in M_{n \times n}(\mathbb{C})\), the following are equivalent: (1) the columns of \(U\) form an orthonormal basis for \(\mathbb{C}^n\); (2) \(\bar{U}^T = U^{-1}\); (3) the rows of \(U\) form an orthonormal basis for \(\mathbb{C}^n\).

Theorem 11.4.8. If \(U\) and \(W\) are unitary, then \(UW\) is unitary.

Definition (Conjugate Transpose). For \(A \in M_{m \times n}(\mathbb{C})\), the conjugate transpose is \(A^* = \bar{A}^T\).

Theorem 11.4.9. For \(A, B \in M_{m \times n}(\mathbb{C})\) and \(\alpha \in \mathbb{C}\): (1) \(\langle A\vec{z}, \vec{w} \rangle = \langle \vec{z}, A^* \vec{w} \rangle\); (2) \((A^*)^* = A\); (3) \((A + B)^* = A^* + B^*\); (4) \((\alpha A)^* = \bar{\alpha} A^*\); (5) \((AB)^* = B^* A^*\).

Lemma 11.4.10 (Proof of Lemma 10.2.2). If \(A\) is a symmetric matrix with real entries, then all eigenvalues are real.

The proof uses the Hermitian inner product: if \(A\vec{z} = \lambda \vec{z}\) with \(\|\vec{z}\| = 1\), then \(\lambda = \langle A\vec{z}, \vec{z} \rangle = \langle \vec{z}, A\vec{z} \rangle = \bar{\lambda}\), so \(\lambda\) is real.

11.5 Unitary Diagonalization

Definition (Hermitian Matrix). A matrix \(A \in M_{n \times n}(\mathbb{C})\) such that \(A^* = A\) is called Hermitian.

Every real symmetric matrix is Hermitian. A Hermitian matrix has real diagonal entries, and the \((\ell, j)\)-entry equals the conjugate of the \((j, \ell)\)-entry.

Theorem 11.5.1. An \(n \times n\) matrix is Hermitian if and only if \(\langle A\vec{z}, \vec{w} \rangle = \langle \vec{z}, A\vec{w} \rangle\) for all \(\vec{z}, \vec{w} \in \mathbb{C}^n\).

Theorem 11.5.2. If \(A\) is Hermitian, then: (1) all eigenvalues of \(A\) are real; (2) eigenvectors corresponding to distinct eigenvalues are orthogonal.

Definition (Unitarily Similar). Matrices \(A\) and \(B\) are unitarily similar if \(U^* A U = B\) for some unitary \(U\).

Definition (Unitarily Diagonalizable). An \(n \times n\) matrix \(A\) is unitarily diagonalizable if it is unitarily similar to a diagonal matrix.

Theorem 11.5.3 (Schur’s Theorem). If \(A\) is an \(n \times n\) matrix, then \(A\) is unitarily similar to an upper triangular matrix whose diagonal entries are the eigenvalues of \(A\).

Theorem 11.5.4 (Spectral Theorem for Hermitian Matrices). If \(A\) is Hermitian, then \(A\) is unitarily diagonalizable.

The proof follows from Schur’s Theorem: \(U^* AU = T\) upper triangular, but \(A^* = A\) forces \(T^* = T\), so \(T\) is diagonal.

Definition (Normal Matrix). An \(n \times n\) matrix \(A\) is called normal if \(AA^* = A^*A\).

Hermitian, skew-Hermitian, unitary, and real orthogonal matrices are all normal.

Theorem 11.5.5 (Spectral Theorem for Normal Matrices). A matrix \(A\) is unitarily diagonalizable if and only if \(A\) is normal.

Theorem 11.5.6. If \(A\) is normal, then: (1) \(\|A\vec{z}\| = \|A^* \vec{z}\|\) for all \(\vec{z} \in \mathbb{C}^n\); (2) \(A - \lambda I\) is normal for every \(\lambda \in \mathbb{C}\); (3) if \(A\vec{z} = \lambda \vec{z}\), then \(A^* \vec{z} = \bar{\lambda} \vec{z}\).

11.6 Cayley-Hamilton Theorem

\[C(A) = c_n A^n + c_{n-1} A^{n-1} + \cdots + c_1 A + c_0 I = O\]

The proof uses Schur’s Theorem: write \(U^* AU = T\) upper triangular with eigenvalues \(\lambda_1, \ldots, \lambda_n\) on the diagonal. Then \(C(T) = (-1)^n(T - \lambda_1 I)(T - \lambda_2 I) \cdots (T - \lambda_n I)\), and one shows by examining columns that this product is the zero matrix. Since \(C(T) = U^* C(A) U\), we conclude \(C(A) = O\).