Diagonalization of complex matrices

Section 4.4 Diagonalization of complex matrices

Recall that when we first defined vector spaces, we mentioned that a vector space can be defined over any field \(\mathbb{F}\text{.}\) To keep things simple, we’ve mostly assumed \(\mathbb{F}=\mathbb{R}\text{.}\) But most of the theorems and proofs we’ve encountered go through unchanged if we work over a general field. (This is not quite true: over a finite field things can get more complicated. For example, if \(\mathbb{F}=\mathbb{Z}_2=\{0,1\}\text{,}\) then we get weird results like \(\vv+\vv=\zer\text{,}\) since \(1+1=0\text{.}\))

In fact, if we replace \(\R\) by \(\C\text{,}\) about the only thing we’d have to go back and change is the definition of the dot product. The reason for this is that although the complex numbers seem computationally more complicated, (which might mostly be because you don’t use them often enough) they follow the exact same algebraic rules as the real numbers. In other words, the arithmetic might be different, but the algebra is the same. There is one key difference between the two fields: over the complex numbers, every polynomial can be factored. This is important if you’re interested in finding eigenvalues.

This section is written based on the assumption that complex numbers were covered in a previous course. If this was not the case, or to review this material, see Appendix A before proceeding.

Subsection 4.4.1 Complex vectors

A complex vector space is simply a vector space where the scalars are elements of \(\C\) rather than \(\R\text{.}\) Examples include polynomials with complex coefficients, complex-valued functions, and \(\C^n\text{,}\) which is defined exactly how you think it should be. In fact, one way to obtain \(\C^n\) is to start with the exact same standard basis we use for \(\R^n\text{,}\) and then take linear combinations using complex scalars.

We’ll write elements of \(\C^n\) as \(\zz = (z_1,z_2,\ldots, z_n)\text{.}\) The complex conjugate of \(\zz\) is given by

\begin{equation*} \bar{\zz} = (\bz_1,\bz_2,\ldots, \bz_n)\text{.} \end{equation*}

The standard inner product on \(\C^n\) looks a lot like the dot product on \(\R^n\text{,}\) with one important difference: we apply a complex conjugate to the second vector.

Definition 4.4.1.

The standard inner product on \(\C^n\) is defined as follows: given \(\zz=(z_1,z_2,\ldots, z_n)\) and \(\ww=(w_1,w_2,\ldots, w_n)\text{,}\)

\begin{equation*} \langle \zz,\ww\rangle = \zz\dotp\bar{\ww} = z_1\bar{w}_1+z_2\bar{w}_2+\cdots + z_n\bar{w}_n\text{.} \end{equation*}

If \(\zz,\ww\) are real, this is just the usual dot product. The reason for using the complex conjugate is to ensure that we still have a positive-definite inner product on \(\C^n\text{:}\)

\begin{equation*} \langle \zz,\zz\rangle = z_1\bz_1+z_2\bz_2+\cdots + z_n\bz_n = \abs{z_1}^2+\abs{z_2}^2+\cdots + \abs{z_n}^2\text{,} \end{equation*}

which shows that \(\langle \zz,\zz\rangle \geq 0\text{,}\) and \(\langle \zz,\zz\rangle = 0\) if and only if \(\zz=\zer\text{.}\)

Exercise 4.4.2.

Compute the dot product of \(\zz = (2-i, 3i, 4+2i)\) and \(\ww = (3i,4-5i,-2+2i)\text{.}\)

This isn’t hard to do by hand, but it’s useful to know how to ask the computer to do it, too. Unfortunately, the dot product in SymPy does not include the complex conjugate. One likely reason for this is that while most mathematicians take the complex conjugate of the second vector, some mathematicians, and most physicists, put the conjugate on the first vector. So they may have decided to remain agnostic about this choice. We can manually apply the conjugate, using Z.dot(W.H). (The .H operation is the hermitian conjugate; see Definition 4.4.6 below.)

Again, you might want to wrap that last term in simplify() (in which case you’ll get \(-22-6i\) for the dot product). Above, we saw that the complex inner product is designed to be positive definite, like the real inner product. The remaining properties of the complex inner product are given as follows.

Theorem 4.4.3.

For any vectors \(\zz_1,\zz_2,\zz_3\) and any complex number \(\alpha\text{,}\)

\(\langle \zz_1+\zz_2,\zz_3\rangle = \langle \zz_1,\zz_3\rangle + \langle \zz_2,\zz_3\rangle\) and \(\langle \zz_1,\zz_2+\zz_3\rangle = \langle \zz_1,\zz_2\rangle + \langle \zz_1,\zz_3\rangle\text{.}\)
\(\langle \alpha\zz_1,\zz_2\rangle = \alpha\langle\zz_1,\zz_2\rangle\) and \(\langle \zz_1,\alpha\zz_2\rangle=\bar{\alpha}\langle \zz_1,\zz_2\rangle\text{.}\)
\(\displaystyle \langle \zz_2,\zz_1\rangle = \overline{\langle \zz_1,\zz_2\rangle}\)
\(\langle \zz_1,\zz_1\rangle\geq 0\text{,}\) and \(\langle \zz_1,\zz_1\rangle =0\) if and only if \(\zz_1=\zer\text{.}\)

Proof.

Using the distributive properties of matrix multiplication and the transpose,

\begin{align*} \langle \zz_1+\zz_2,\zz_3\rangle \amp= (\zz_1+\zz_2)^T\bar{\zz_3}\\ \amp =(\zz_1^T+\zz_2^T)\bar{\zz_3}\\ \amp =\zz_1^T\bar{\zz_3}+\zz_2^T\bar{\zz_3}\\ \amp =\langle \zz_1,\zz_3\rangle + \langle \zz_2,\zz_3\rangle\text{.} \end{align*}

The proof is similar when addition is in the second component. (But not identical -- you’ll need the fact that the complex conjugate is distributive, rather than the transpose.)
These again follow from writing the inner product as a matrix product.

\begin{equation*} \langle \alpha\zz_1,\zz_2\rangle = (\alpha \zz_1)^T\bar{\zz_2} = \alpha(\zz_1^T\bar{\zz_2}) = \alpha\langle\zz_1,\zz_2\rangle\text{,} \end{equation*}

and

\begin{equation*} \langle \zz_1,\alpha\zz_2\rangle = \zz_1^T\overline{\alpha \zz_2} = \zz_1^T(\bar{\alpha}\bar{\zz_2}) = \bar{\alpha}(\zz_1^T\zz_2)=\alpha\langle \zz_1,\zz_2\rangle\text{.} \end{equation*}
Note that for any vectors \(\zz,\ww\text{,}\) \(\zz^T\ww\) is a number, and therefore equal to its own transpose. Thus, we have \(\zz^T\ww = (\zz^T\ww)^T=\ww^T\zz\text{,}\) and

\begin{equation*} \overline{\langle \zz_1,\zz_2\rangle} = \overline{\zz_1^T\bar{\zz_2}} = \overline{\bar{\zz_2}^T\zz_1} = \zz_2^T\overline{\zz_1}=\langle \zz_2,\zz_1\rangle\text{.} \end{equation*}
This was already demonstrated above.

Definition 4.4.4.

The norm of a vector \(\zz = (z_1,z_2,\ldots, z_n)\) in \(\C^n\) is given by

\begin{equation*} \len{\zz} = \sqrt{\langle \zz,\zz\rangle} = \sqrt{\abs{z_1}^2+\abs{z_2}^2+\cdots +\abs{z_n}^2}\text{.} \end{equation*}

Note that much like the real norm, the complex norm satisfies \(\len{\alpha\zz}=\abs{\alpha}\len{\zz}\) for any (complex) scalar \(\alpha\text{.}\)

Exercise 4.4.5.

The norm of a complex vector is always a real number.

True.
Since the norm is computed using the modulus, which is always real and non-negative, the norm will be a real number as well. If you ever get a complex number for your norm, you’ve probably forgotten the complex conjugate somewhere.
False.
Since the norm is computed using the modulus, which is always real and non-negative, the norm will be a real number as well. If you ever get a complex number for your norm, you’ve probably forgotten the complex conjugate somewhere.

Subsection 4.4.2 Complex matrices

Linear transformations are defined in exactly the same way, and a complex matrix is simply a matrix whose entries are complex numbers. There are two important operations defined on complex matrices: the conjugate, and the conjugate transpose (also known as the hermitian transpose).

Definition 4.4.6.

The conjugate of a matrix \(A=[a_{ij}]\in M_{mn}(\C)\) is the matrix \(\bar{A}=[\bar{a}_{ij}]\text{.}\) The conjugate transpose of \(A\) is the matrix \(A^H\) defined by

\begin{equation*} A^H = (\bar{A})^T=\overline{(A^T)}\text{.} \end{equation*}

Note that many textbooks use the notation \(A^\dagger\) for the conjugate transpose.

Definition 4.4.7.

An \(n\times n\) matrix \(A\in M_{nn}(\C)\) is called hermitian if \(A^H = A\text{,}\) and unitary if \(A^H = A^{-1}\text{.}\) (A matrix is skew-hermitian if \(A^H=-A\text{.}\))

Hermitian and unitary matrices (or more accurately, linear operators) are very important in quantum mechanics. Indeed, hermitian matrices represent “observable” quantities, in part because their eigenvalues are real, as we’ll soon see. For us, hermitian and unitary matrices can simply be viewed as the complex counterparts of symmetric and orthogonal matrices, respectively. In fact, a real symmetric matrix is hermitian, since the conjugate has no effect on it, and similarly, a real orthogonal matrix is technically unitary. As with orthogonal matrices, a unitary matrix can also be characterized by the property that its rows and columns both form orthonormal bases.

Exercise 4.4.8.

Show that the matrix \(A = \bbm 4\amp 1-i\amp -2+3i\\1+i\amp 5 \amp 7i\\-2-3i\amp -7i\amp -4\ebm\) is hermitian, and that the matrix \(B = \dfrac12\bbm 1+i\amp \sqrt{2}\\1-i\amp\sqrt{2}i\ebm\) is unitary.

When using SymPy, the hermitian conjugate of a matrix A is executed using A.H. (There appears to also be an equivalent operation named Dagger coming from sympy.physics.quantum, but I’ve had more success with .H.) The complex unit is entered as I. So for the exercise above, we can do the following.

The last line verifies that \(A=A^H\text{.}\) We could also replace it with A,A.H to explicitly see the two matrices side by side. Now, let’s confirm that \(B\) is unitary.

Hmm... That doesn’t look like the identity on the right. Maybe try replacing B*B.H with simplify(B*B.H). (You will want to add from sympy import simplify at the top of the cell.) Or you could try B.H, B**-1 to compare results. Actually, what’s interesting is that in a Sage cell, B.H == B**-1 yields False, but B.H == simplify(B**-1) yields True!

As mentioned above, hermitian matrices are the complex analogue of symmetric matrices. Recall that a key property of a symmetric matrix is its symmetry with respect to the dot product. For a symmetric matrix \(A\text{,}\) we had \(\mathbf{x}\dotp (A\mathbf{y})=(A\mathbf{x})\dotp \mathbf{y}\text{.}\) Hermtian matrices exhibit the same behaviour with respect to the complex inner product.

Theorem 4.4.9.

An \(n\times n\) complex matrix \(A\) is hermitian if and only if

\begin{equation*} \langle A\zz,\ww\rangle = \langle \zz, A\ww\rangle \end{equation*}

for any \(\zz,\ww\in\C^n\)

Proof.

Note that the property \(A^H=A\) is equivalent to \(A^T=\bar{A}\text{.}\) This gives us

\begin{equation*} \langle A\zz,\ww\rangle = (A\zz)^T\bar{\ww} = (\zz^TA^T)\bar{\ww} = (\zz^T\bar{A})\bar{\ww}=\zz^T(\overline{A\ww}) = \langle \zz,\ww\rangle\text{.} \end{equation*}

Conversely, suppose \(\langle A\zz,\ww\rangle = \langle \zz, A\ww\rangle\) for all \(\zz,\ww\in \C^n\text{,}\) and let \(\basis{e}{n}\) denote the standard basis for \(\C^n\text{.}\) Then

\begin{equation*} a_{ji}=\langle A\mathbf{e}_i,\mathbf{e}_j\rangle = \langle \mathbf{e}_i,A\mathbf{e}_j\rangle = \overline{a_{ij}}\text{,} \end{equation*}

which shows that \(A^T=\bar{A}\text{.}\)

Next, we’ve noted that one advantage of doing linear algebra over \(\C\) is that every polynomial can be completely factored, including the characteristic polynomial. This means that we can always find eigenvalues for a matrix. When that matrix is hermitian, we get a surprising result.

Theorem 4.4.10.

For any hermitian matrix \(A\text{,}\)

The eigenvalues of \(A\) are real.
Eigenvectors corresponding to distinct eigenvalues are orthogonal.

Proof.

Suppose \(A\zz = \lambda\zz\) for some \(\lambda\in\C\) and \(\zz\neq \zer\text{.}\) Then

\begin{equation*} \lambda \langle \zz,\zz\rangle = \langle \lambda\zz,\zz\rangle = \langle A\zz,\zz \rangle = \langle \zz, A\zz\rangle = \langle \zz,\lambda\zz\rangle = \bar{\lambda}\langle \zz,\zz\rangle\text{.} \end{equation*}

Thus, \((\lambda-\bar{\lambda})\len{\zz}^2=0\text{,}\) and since \(\len{z}\neq 0\text{,}\) we must have \(\bar{\lambda}=\lambda\text{,}\) which means \(\lambda\in\R\text{.}\)
Similarly, suppose \(\lambda_1,\lambda_2\) are eigenvalues of \(A\text{,}\) with corresponding eigenvectors \(\zz,\ww\text{.}\) Then

\begin{equation*} \lambda_1\langle \zz,\ww\rangle = \langle \lambda_1\zz,\ww\rangle = \langle A\zz,\ww\rangle =\langle \zz,A\ww\rangle = \langle \zz,\lambda_2\ww\rangle = \bar{\lambda_2}\langle\zz,\ww\rangle\text{.} \end{equation*}

This gives us \((\lambda_1-\bar{\lambda_2})\langle \zz,\ww\rangle=0\text{.}\) And since we already know \(\lambda_2\) must be real, and \(\lambda_1\neq \lambda_2\text{,}\) we must have \(\langle \zz,\ww\rangle = 0\text{.}\)

In light of Theorem 4.4.10, we realize that diagonalization of hermitian matrices will follow the same script as for symmetric matrices. Indeed, Gram-Schmidt Orthonormalization Algorithm applies equally well in \(\C^n\text{,}\) as long as we replace the dot product with the complex inner product. This suggests the following.

Theorem 4.4.11. Spectral Theorem.

If \(A\) is an \(n\times n\) hermitian matrix, then there exists an orthonormal basis of \(\C^n\) consisting of eigenvectors of \(A\text{.}\) Moreover, the matrix \(U\) whose columns consist of those eigenvectors is unitary, and the matrix \(U^HAU\) is diagonal.

Exercise 4.4.12.

Confirm that the matrix \(A = \bbm 4 \amp 3-i\\3+i\amp 1\ebm\) is hermitian. Then, find the eigenvalues of \(A\text{,}\) and a unitary matrix \(U\) such that \(U^HAU\) is diagonal.

To do the above exercise using SymPy, we first define \(A\) and ask for the eigenvectors.

We can now manually determine the matrix \(U\text{,}\) as we did above, and input it:

To confirm it’s unitary, add the line U*U.H to the above, and confirm that you get the identity matrix as output. You might need to use simplify(U*U.H) if the result is not clear. Now, to confirm that \(U^HAU\) really is diagonal, go back to the cell above, and enter it. Try (U.H)*A*U, just to remind yourself that adding the simplify command is often a good idea.

If you want to cut down on the manual labour involved, we can make use of some of the other tools SymPy provides. In the next cell, we’re going to assign the output of A.eigenvects() to a list. The only trouble is that the output of the eigenvector command is a list of lists. Each list item is a list (eigenvalue, multiplicity, [eigenvectors]).

Try the above modifications, in sequence. First, replacing the second line by L[0] will give the first list item, which is another list:

\begin{equation*} \left(-1,1,\left[\bbm -\frac35+\frac{i}{5}\ebm\right]\right)\text{.} \end{equation*}

We want the third item in the list, so try (L[0])[2]. But note the extra set of brackets! There could (in theory) be more than one eigenvector, so this is a list with one item. To finally get the vector out, try ((L[0])[2])[0]. (There is probably a better way to do this. Someone who is more fluent in Python is welcome to advise.)

Now that we know how to extract the eigenvectors, we can normalize them, and join them to make a matrix. The norm of a vector is simnply v.norm(), and to join column vectors u1 and u2 to make a matrix, we can use the command u1.row_join(u2). We already defined the matrix A and list L above, but here is the whole routine in one cell, in case you didn’t run all the cells above.

Believe me, you want the simplify command on that last matrix.

While Theorem 4.4.11 guarantees that any hermitian matrix can be “unitarily diagonalized”, there are also non-hermitian matrices for which this can be done as well. A classic example of this is the rotation matrix \(\bbm 0\amp 1\\-1\amp 0\ebm\text{.}\) This is a real matrix with complex eigenvalues \(\pm i\text{,}\) and while it is neither symmetric nor hermitian, it can be orthogonally diagonalized. This should be contrasted with the real spectral theorem, where any matrix that can be orthogonally diagonalized is necessarily symmetric.

This suggests that perhaps hermitian matrices are not quite the correct class of matrix for which the spectral theorem should be stated. Indeed, it turns out there is a somewhat more general class of matrix: the normal matrices.

Definition 4.4.13.

An \(n\times n\) matrix \(A\) is normal if \(A^HA = AA^H\text{.}\)

Exercise 4.4.14.

\(\begin{bmatrix} 3\amp 1-3i\\ 1+3i\amp -4\end{bmatrix}\)
This matrix is hermitian, and we know that every hermitian matrix is normal.
\(\begin{bmatrix} 1\amp 3\\ 0 \amp 2\end{bmatrix}\)
This matrix is not normal; this can be confirmed by direct computation, or by noting that it cannot be diagonalized.
\(\frac{1}{\sqrt{2}}\begin{bmatrix} 1\amp 1\\ i \amp -i\end{bmatrix} \)
This matrix is unitary, and every unitary matrix is normal.
\(\begin{bmatrix} i \amp 2i\\ 2i \amp 3i\end{bmatrix}\)
This matrix is neither hermitian nor unitary, but it is normal, which can be verified by direct computation.

It turns out that a matrix \(A\) is normal if and only if \(A=UDU^H\) for some unitary matrix \(U\) and diagonal matrix \(D\text{.}\) A further generalization is known as Schur’s Theorem.

Theorem 4.4.15.

For any complex \(n\times n\) matrix \(A\text{,}\) there exists a unitary matrix \(U\) such that \(U^HAU = T\) is upper-triangular, and such that the diagonal entries of \(T\) are the eigenvalues of \(A\text{.}\)

Using Schur’s Theorem, we can obtain a famous result, known as the Cayley-Hamilton Theorem, for the case of complex matrices. (It is true for real matrices as well, but we don’t yet have the tools to prove it.) The Cayley-Hamilton Theorem states that substituting any matrix into its characteristic polynomial results in the zero matrix. To understand this result, we should first explain how to define a polynomial of a matrix.

Given a polynomial \(p(x) = a_0+a_1x+\cdots + a_nx_n\text{,}\) we define \(p(A)\) as

\begin{equation*} p(A) = a_0I+a_1A+\cdots + a_nA^n\text{.} \end{equation*}

(Note the presence of the identity matrix in the first term, since it does not make sense to add a scalar to a matrix.) Note further that since \((P^{-1}AP)^n = P^{-1}A^nP\) for any invertible matrix \(P\) and positive integer \(n\text{,}\) we have \(p(U^HAU)=U^Hp(A)U\) for any polynomial \(p\) and unitary matrix \(U\text{.}\)

Theorem 4.4.16.

Let \(A\) be an \(n\times n\) complex matrix, and let \(c_A(x)\) denote the characteristic polynomial of \(A\text{.}\) Then we have \(c_A(A)=0\text{.}\)

Proof.

By Theorem 4.4.15, there exists a unitary matrix \(U\) such that \(A = UTU^H\text{,}\) where \(T\) is upper triangular, and has the eigenvalues of \(A\) as diagonal entries. Since \(c_A(A)=c_A(UTU^H)=Uc_A(T)U^H\text{,}\) and \(c_A(x)=c_T(x)\) (since \(A\) and \(T\) are similar) it suffices to show that \(c_A(A)=0\) when \(A\) is upper-triangular. (If you like, we are showing that \(C_T(T)=0\text{,}\) and deducing that \(c_A(A)=0\text{.}\)) But if \(A\) is upper-triangular, so is \(xI_A\text{,}\) and therefore, \(\det(xI-A)\) is just the product of the diagonal entries. That is,

\begin{equation*} c_A(x) = (x-\lambda_1)(x-\lambda_2)\cdots (x-\lambda_n)\text{,} \end{equation*}

\begin{equation*} c_A(A) = (A-\lambda_1I)(A-\lambda_2I)\cdots (A-\lambda_nI)\text{.} \end{equation*}

Since the first column of \(A\) is \(\bbm \lambda_1\amp 0 \amp \cdots \amp 0\ebm^T\text{,}\) the first column of \(A-\lambda_1I\) is identically zero. The second column of \(A-\lambda_2I\) similarly has the form \(\bbm k\amp 0\amp\cdots\amp 0\ebm\) for some number \(k\text{.}\)

It follows that the first two columns of \((A-\lambda_1I)(A-\lambda_2I)\) are identically zero. Since only the first two entries in the third column of \((A-\lambda_3I)\) can be nonzero, we find that the first three columns of \((A-\lambda_1I)(A-\lambda_2I)(A-\lambda_3I)\) are zero, and so on.

Exercises 4.4.3 Exercises

1.

Suppose \(A\) is a \(3 \times 3\) matrix with real entries that has a complex eigenvalue \({7-i}\) with corresponding eigenvector \({\left[\begin{array}{c} -3+9i\cr 1\cr -6i \end{array}\right]}\text{.}\) Find another eigenvalue and eigenvector for \(A\text{.}\)

2.

Give an example of a [`2 times 2 `] matrix with no real eigenvalues.

3.

Find all the eigenvalues (real and complex) of the matrix

\begin{equation*} M = {\left[\begin{array}{ccc} 5 \amp 8 \amp 17\cr 3 \amp 0 \amp 5\cr -3 \amp -3 \amp -8 \end{array}\right]}. \end{equation*}

4.

Find all the eigenvalues (real and complex) of the matrix

\begin{equation*} M = {\left[\begin{array}{cccc} -7 \amp 0 \amp -1 \amp -3\cr -13 \amp -1 \amp -4 \amp -5\cr 0 \amp 0 \amp 0 \amp 0\cr 19 \amp 0 \amp 5 \amp 7 \end{array}\right]}. \end{equation*}

5.

Let \(M = {\left[\begin{array}{cc} -4 \amp 9\cr -9 \amp -4 \end{array}\right]}.\) Find formulas for the entries of \(M^n\text{,}\) where \(n\) is a positive integer. (Your formulas should not contain complex numbers.)

6.

Let

\begin{equation*} M = {\left[\begin{array}{ccc} -2 \amp -4 \amp 4\cr 4 \amp -2 \amp -8\cr 0 \amp 0 \amp -2 \end{array}\right]}. \end{equation*}

Find formulas for the entries of \(M^n\text{,}\) where \(n\) is a positive integer. (Your formulas should not contain complex numbers.)

7.

Let \(M = {\left[\begin{array}{ccc} -1 \amp 3 \amp 2\cr 3 \amp -2 \amp -2\cr 2 \amp -2 \amp -3 \end{array}\right]}\text{.}\) Find \(c_1\text{,}\) \(c_2\text{,}\) and \(c_3\) such that \(M^3 + c_1 M^2 + c_2 M + c_3 I_3 = 0\text{,}\) where \(I_3\) is the \(3\times 3\) identity matrix.

Linear Algebra: A second course, featuring proofs and Python

Search Results:

Section 4.4 Diagonalization of complex matrices

Subsection 4.4.1 Complex vectors

Definition 4.4.1.

Exercise 4.4.2.

Theorem 4.4.3.

Proof.

Definition 4.4.4.

Exercise 4.4.5.

Subsection 4.4.2 Complex matrices

Definition 4.4.6.

Definition 4.4.7.

Exercise 4.4.8.

Theorem 4.4.9.

Proof.

Theorem 4.4.10.

Proof.

Theorem 4.4.11. Spectral Theorem.

Exercise 4.4.12.

Definition 4.4.13.

Exercise 4.4.14.

Theorem 4.4.15.

Theorem 4.4.16.

Proof.

Exercises 4.4.3 Exercises

1.

2.

3.

4.

5.

6.

7.