Diagonalization of symmetric matrices

Section 4.2 Diagonalization of symmetric matrices

Recall that an \(n\times n\) matrix \(A\) is symmetric if \(A^T=A\text{.}\) Symmetry of \(A\) is equivalent to the following: for any vectors \(\xx,\yy\in\R^n\text{,}\)

\begin{equation*} \xx\dotp (A\yy) = (A\xx)\dotp \yy\text{.} \end{equation*}

To see that this is implied by the symmetry of \(A\text{,}\) note that

\begin{equation*} \xx\dotp (A\yy) = \xx^T(A\yy)=(\xx^TA^T)\yy = (A\xx)^T\yy=(A\xx)\dotp\yy\text{.} \end{equation*}

Exercise 4.2.1.

Prove that if \(\xx\dotp(A\yy)=(A\xx)\dotp \yy\) for any \(\xx,\yy\in\R^n\text{,}\) then \(A\) is symmetric.

Hint.

If this condition is true for all \(\xx,\yy\in \R^n\text{,}\) then it is true in particular for the vectors in the standard basis for \(\R^n\text{.}\)

A useful property of symmetric matrices, mentioned earlier, is that eigenvectors corresponding to distinct eigenvalues are orthogonal.

Theorem 4.2.2.

If \(A\) is a symmetric matrix, then eigenvectors corresponding to distinct eigenvalues are orthogonal.

Strategy.

We want to show that if \(\xx_1,\xx_2\) are eigenvectors corresponding to distinct eigenvalues \(\lambda_1,\lambda_2\text{,}\) then \(\xx_1\dotp \xx_2=0\text{.}\) It was pointed out above that since \(A\) is symmetric, we know \((A\xx_1)\dotp \xx_2=\xx_1\dotp (A\xx_2)\text{.}\) Can you see how to use this, and the fact that \(\xx_1,\xx_2\) are eigenvectors, to prove the result?

Proof.

To see this, suppose \(A\) is symmetric, and that we have

\begin{equation*} A\xx_1=\lambda_1\xx_1\quad \text{ and } A\xx_2=\lambda_2\xx_2\text{,} \end{equation*}

with \(\xx_1\neq\zer,\xx_2\neq \zer\text{,}\) and \(\lambda_1\neq \lambda_2\text{.}\) We then have, since \(A\) is symmetric, and using the result above,

\begin{equation*} \lambda_1(\xx_1\dotp \xx_2) = (\lambda_1\xx_1)\dotp \xx_2 = (A\xx_1)\dotp \xx_2 = \xx_1\dotp(A\xx_2) = \xx_1(\lambda_2\xx_2) = \lambda_2(\xx_1\dotp\xx_2)\text{.} \end{equation*}

It follows that \((\lambda_1-\lambda_2)(\xx_1\dotp \xx_2)=0\text{,}\) and since \(\lambda_1\neq \lambda_2\text{,}\) we must have \(\xx_1\dotp \xx_2=0\text{.}\)

The procedure for diagonalizing a matrix is as follows: assuming that \(\dim E_\lambda(A)\) is equal to the multiplicity of \(\lambda\) for each distinct eigenvalue \(\lambda\text{,}\) we find a basis for \(E_\lambda(A)\text{.}\) The union of the bases for each eigenspace is then a basis of eigenvectors for \(\R^n\text{,}\) and the matrix \(P\) whose columns are those eigenvectors will satisfy \(P^{-1}AP = D\text{,}\) where \(D\) is a diagonal matrix whose diagonal entries are the eigenvalues of \(A\text{.}\)

If \(A\) is symmetric, we know that eigenvectors from different eigenspaces will be orthogonal to each other. If we further choose an orthogonal basis of eigenvectors for each eigenspace (which is possible via the Gram-Schmidt procedure), then we can construct an orthogonal basis of eigenvectors for \(\R^n\text{.}\) Furthermore, if we normalize each vector, then we’ll have an orthonormal basis. The matrix \(P\) whose columns consist of these orthonormal basis vectors has a name.

Definition 4.2.3.

A matrix \(P\) is called orthogonal if \(P^T = P^{-1}\text{.}\)

Theorem 4.2.4.

A matrix \(P\) is orthogonal if and only if the columns of \(P\) form an orthonormal basis for \(\R^n\text{.}\)

Strategy.

This more or less amounts to the fact that \(P^T=P^{-1}\) if and only if \(PP^T=I\text{,}\) and thinking about the matrix product in terms of dot products.

A fun fact is that if the columns of \(P\) are orthonormal, then so are the rows. But this is not true if we ask for the columns to be merely orthogonal. For example, the columns of \(A = \bbm 1\amp 0\amp 5\\-2\amp 1\amp 2\\1\amp 2\amp -1\ebm \) are orthogonal, but (as you can check) the rows are not. But if we normalize the columns, we get

\begin{equation*} P = \bbm 1/\sqrt{6}\amp 0 \amp 1/\sqrt{30}\\-2/\sqrt{6}\amp 1/\sqrt{5}\amp 2/\sqrt{30}\\1/\sqrt{6}\amp 2/\sqrt{5}\amp -1/\sqrt{30}\ebm\text{,} \end{equation*}

which, as you can confirm, is an orthogonal matrix.

Definition 4.2.5.

An \(n\times n\) matrix \(A\) is said to be orthogonally diagonalizable if there exists an orthogonal matrix \(P\) such that \(P^TAP\) is diagonal.

The above definition leads to the following result, also known as the Principal Axes Theorem. A careful proof is quite difficult, and omitted from this book. The hard part is showing that any symmetric matrix is orthogonally diagonalizable. There are a few ways to do this, most requiring induction on the size of the matrix. A common approach actually uses multivariable calculus! (Optimization via Lagrange multipliers, to be precise.) If you are reading this along with the book by Nicholson, there is a gap in his proof: in the induction step, he assumes the existence of a real eigenalue of \(A\text{,}\) but this has to be proved!

Theorem 4.2.6. Real Spectral Theorem.

The following are equivalent for a real \(n\times n\) matrix \(A\text{:}\)

\(A\) is symmetric.
There is an orthonormal basis for \(\R^n\) consisting of eigenvectors of \(A\text{.}\)
\(A\) is orthogonally diagonalizable.

Exercise 4.2.7.

Determine the eigenvalues of \(A=\bbm 5\amp -2\amp -4\\-2\amp 8\amp -2\\-4\amp -2\amp 5\ebm\text{,}\) and find an orthogonal matrix \(P\) such that \(P^TAP\) is diagonal.

Linear Algebra: A second course, featuring proofs and Python

Search Results:

Section 4.2 Diagonalization of symmetric matrices

Exercise 4.2.1.

Theorem 4.2.2.

Strategy.

Proof.

Definition 4.2.3.

Theorem 4.2.4.

Strategy.

Definition 4.2.5.

Theorem 4.2.6. Real Spectral Theorem.

Exercise 4.2.7.