Skip to main content

Section 5.5 Generalized eigenspaces

Example Example 5.3.9 showed us that if \(V=U\oplus W\text{,}\) where \(U\) and \(W\) are \(T\)-invariant, then the matrix \(M_B(T)\) has block diagonal form \(\bbm A \amp 0\\0\amp B\ebm\text{,}\) as long as the basis \(B\) is the union of bases of \(U\) and \(W\text{.}\)
We want to take this idea further. If \(V = U_1\oplus U_2\oplus \cdots \oplus U_k\text{,}\) where each subspace \(U_j\) is \(T\)-invariant, then with respect to a basis \(B\) consisting of basis vectors for each subspace, we will have
\begin{equation*} M_B(T)=\bbm A_1 \amp 0 \amp \cdots \amp 0\\ 0 \amp A_2 \amp \cdots \amp 0\\ \vdots \amp \vdots \amp \ddots \amp \vdots\\ 0 \amp 0 \amp \cdots \amp A_k\ebm\text{,} \end{equation*}
where each \(A_j\) is the matrix of \(T|_{U_j}\) with respect to some basis of \(U_j\text{.}\)
Our goal moving forward is twofold: one, to make the blocks as small as possible, so that \(M_B(T)\) is as close to diagonal as possible, and two, to make the blocks as simple as possible. Of course, if \(T\) is diagonalizable, then we can get all blocks down to size \(1\times 1\text{,}\) but this is not always possible.
Recall from Section 4.1 that if the characteristic polynomial of \(T\) (or equivalently, any matrix representation \(A\) of \(T\)) is
\begin{equation*} c_T(x) = (x-\lambda_1)^{m_1}(x-\lambda_2)^{m_2}\cdots (x-\lambda_k)^{m_k}\text{,} \end{equation*}
then \(\dim E_{\lambda_j}(T)\leq m_j\) for each \(j=1,\ldots, k\text{,}\) and \(T\) is diagonalizable if and only if we have equality for each \(j\text{.}\) (This guarantees that we have sufficiently many independent eigenvectors to form a basis of \(V\text{.}\))
Since eigenspaces are \(T\)-invariant, we see that being able to diagonalize \(T\) is equivalent to having the direct sum decomposition
\begin{equation*} V = E_{\lambda_1}(T)\oplus E_{\lambda_2}(T)\oplus \cdots \oplus E_{\lambda_k}(T)\text{.} \end{equation*}
If \(T\) cannot be diagonalized, it’s because we came up short on the number of eigenvectors, and the direct sum of all eigenspaces only produces some subspace of \(V\) of lower dimension. We now consider how one might enlarge a set of independent eigenvectors in some standard, and ideally optimal, way.
First, we note that for any operator \(T\text{,}\) the restriction of \(T\) to \(\ker T\) is the zero operator, since by definition, \(T(\vv)=\zer\) for all \(\vv\in\ker T\text{.}\) Since we define \(E_{\lambda}(T)=\ker (T-\lambda I)\text{,}\) it follows that \(T-\lambda I\) restricts to the zero operator on the eigenspace \(E_\lambda(T)\text{.}\) The idea is to relax the condition “identically zero” to something that will allow us to potentially enlarge some of our eigenspaces, so that we end up with enough vectors to span \(V\text{.}\)
It turns out that the correct replacement for “identically zero” is “nilpotent”. What we would like to find is some subspace \(G_\lambda(T)\) such that the restriction of \(T-\lambda I\) to \(G_\lambda(T)\) will be nilpotent. (Recall that this means \((T-\lambda I)^k = 0\) for some integer \(k\) when restricted to \(G_\lambda(T)\text{.}\)) The only problem is that we don’t (yet) know what this subspace should be. To figure it out, we rely on some ideas you may have explored in your last assignment.
In other words, for any operator \(T\text{,}\) the kernels of successive powers of \(T\) can get bigger, but the moment the kernel doesn’t change for the next highest power, it stops changing for all further powers of \(T\text{.}\) That is, we have a sequence of kernels of strictly greater dimension until we reach a maximum, at which point the kernels stop growing. And of course, the maximum dimension cannot be more than the dimension of \(V\text{.}\)

Definition 5.5.2.

Let \(T:V\to V\) be a linear operator, and let \(\lambda\) be an eigenvalue of \(T\text{.}\) The generalized eigenspace of \(T\) associated to the eigenvalue \(\lambda\) is denoted \(G_\lambda(T)\text{,}\) and defined as
\begin{equation*} G_\lambda(T) = \ker (T-\lambda I)^n\text{,} \end{equation*}
where \(n=\dim V\text{.}\)
Some remarks are in order. First, we can actually define \(G_\lambda(T)\) for any scalar \(\lambda\text{.}\) But this space will be trivial if \(\lambda\) is not an eigenvalue. Second, it is possible to show (although we will not do so here) that if \(\lambda\) is an eigenvalue with multiplicity \(m\text{,}\) then \(G_\lambda(T)=\ker (T-\lambda I)^m\text{.}\) (The kernel will usually have stopped growing well before we hit \(n=\dim V\text{,}\) but we know they’re all eventually equal, so using \(n\) guarantees we have everything).
We will not prove it here (see Nicholson, or Axler), but the advantage of using generalized eigenspaces is that they’re just big enough to cover all of \(V\text{.}\)
For each eigenvalue \(\lambda_j\) of \(T\text{,}\) let \(l_j\) denote the smallest integer power such that \(G_{\lambda_j}(T) = (T-\lambda_j I)^{l_j}\text{.}\) Then certainly we have \(l_j\leq m_j\) for each \(j\text{.}\) (Note also that if \(l_j=1\text{,}\) then \(G_{\lambda_j}(T)=E_{\lambda_j}(T)\text{.}\))
The polynomial \(m_T(x) = (x-\lambda_1)^{l_1}(x-\lambda_2)^{l_2}\cdots (x-\lambda_k)^{l_k}\) is the polynomial of smallest degree such that \(m_T(T)=0\text{.}\) The polynomial \(m_T(x)\) is called the minimal polynomial of \(T\text{.}\) Note that \(T\) is diagonalizable if and only if the minimal polynomial of \(T\) has no repeated roots.
In Section 5.6, we’ll explore a systematic method for determining the generalized eigenspaces of a matrix, and in particular, for computing a basis for each generalized eigenspace, with respect to which the corresponding block in the block-diagonal form is especially simple.