Power iteration

In mathematics, the power iteration is an eigenvalue algorithm: given a matrix $A$ , the algorithm will produce a number $\lambda$ , which is the greatest (in absolute value) eigenvalue of $A$ , and a nonzero vector $v$ , the corresponding eigenvector of $\lambda$ , such that $Av=\lambda v$ . The algorithm is also known as the Von Mises iteration.^[1]

The power iteration is a very simple algorithm, but it may converge slowly. It does not compute a matrix decomposition, and hence it can be used when $A$ is a very large sparse matrix.

The method

The power iteration algorithm starts with a vector $b_{0}$ , which may be an approximation to the dominant eigenvector or a random vector. The method is described by the recurrence relation

b_{k+1}={\frac {Ab_{k}}{\|Ab_{k}\|}}

So, at every iteration, the vector $b_{k}$ is multiplied by the matrix $A$ and normalized.

If we assume $A$ has an eigenvalue that is strictly greater in magnitude than its other eigenvalues and the starting vector $b_{0}$ has a nonzero component in the direction of an eigenvector associated with the dominant eigenvalue, then a subsequence $\left(b_{{k}}\right)$ converges to an eigenvector associated with the dominant eigenvalue.

Without the two assumptions above, the sequence $\left(b_{{k}}\right)$ does not necessarily converge. In this sequence,

b_k = e^{i \phi_k} v_1 + r_k

where $v_{1}$ is an eigenvector associated with the dominant eigenvalue, and $\|r_{{k}}\|\rightarrow 0$ . The presence of the term $e^{{i\phi _{{k}}}}$ implies that $\left(b_{{k}}\right)$ does not converge unless $e^{{i\phi _{{k}}}}=1$ . Under the two assumptions listed above, the sequence $\left(\mu _{{k}}\right)$ defined by

\mu _{{k}}={\frac {b_{{k}}^{{*}}Ab_{{k}}}{b_{{k}}^{{*}}b_{{k}}}}

converges to the dominant eigenvalue.

One may compute this with the following algorithm:

for each(''simulation'') {
    // calculate the matrix-by-vector product Ab
    for(i = 0; i < n; i++) {
         tmp[i] = 0;          
         for (j = 0; j < n; j++) {
              // dot product of i-th row in A with the column vector b
              tmp[i] += A[i][j] * b[j]; 
         }
    }

    // calculate the length of the resulting vector:
    // if v = (v1 v2 ... vn), then ||v|| = square_root(v1*v1 + v2*v2 + ... + vn*vn)
    norm_sq = 0;
    for (k = 0; k < n; k++) {
         norm_sq += tmp[k] * tmp[k]; 
    }

    norm = square_root(norm_sq);

    // normalize b to unit vector for next iteration
    b = tmp / norm;
}

The value of $norm$ converges to the absolute value of the dominant eigenvalue, and the vector $b$ to an associated eigenvector.

Note: The above code assumes that the entries of $A$ and $b$ are real. To handle complex values, change $A[i][j]$ above to $conj(A[i][j])$ , and change $tmp[k]*tmp[k]$ to $conj(tmp[k])*tmp[k]$ .

This algorithm is the one used to calculate such things as the Google PageRank.

The method can also be used to calculate the spectral radius (the largest eigenvalue) of a matrix) by computing the Rayleigh quotient

{\frac {b_{k}^{\top }Ab_{k}}{b_{k}^{\top }b_{k}}}={\frac {b_{{k+1}}^{\top }b_{k}}{b_{k}^{\top }b_{k}}}.

Analysis

Let $A$ be decomposed into its Jordan canonical form: $A=VJV^{{-1}}$ , where the first column of $V$ is an eigenvector of $A$ corresponding to the dominant eigenvalue $\lambda _{{1}}$ . Since the dominant eigenvalue of $A$ is unique, the first Jordan block of $J$ is the $1 \times 1$ matrix ${\begin{bmatrix}\lambda _{{1}}\end{bmatrix}}$ , where $\lambda _{{1}}$ is the largest eigenvalue of A in magnitude. The starting vector $b_{{0}}$ can be written as a linear combination of the columns of V: $b_{{0}}=c_{{1}}v_{{1}}+c_{{2}}v_{{2}}+\cdots +c_{{n}}v_{{n}}$ . By assumption, $b_{{0}}$ has a nonzero component in the direction of the dominant eigenvalue, so $c_{{1}}\neq 0$ .

The computationally useful recurrence relation for $b_{{k+1}}$ can be rewritten as: $b_{{k+1}}={\frac {Ab_{{k}}}{\|Ab_{{k}}\|}}={\frac {A^{{k+1}}b_{{0}}}{\|A^{{k+1}}b_{{0}}\|}}$ , where the expression: ${\frac {A^{{k+1}}b_{{0}}}{\|A^{{k+1}}b_{{0}}\|}}$ is more amenable to the following analysis.
$\displaystyle {\begin{array}{lcl}b_{{k}}&=&{\frac {A^{{k}}b_{{0}}}{\|A^{{k}}b_{{0}}\|}}\\&=&{\frac {\left(VJV^{{-1}}\right)^{{k}}b_{{0}}}{\|\left(VJV^{{-1}}\right)^{{k}}b_{{0}}\|}}\\&=&{\frac {VJ^{{k}}V^{{-1}}b_{{0}}}{\|VJ^{{k}}V^{{-1}}b_{{0}}\|}}\\&=&{\frac {VJ^{{k}}V^{{-1}}\left(c_{{1}}v_{{1}}+c_{{2}}v_{{2}}+\cdots +c_{{n}}v_{{n}}\right)}{\|VJ^{{k}}V^{{-1}}\left(c_{{1}}v_{{1}}+c_{{2}}v_{{2}}+\cdots +c_{{n}}v_{{n}}\right)\|}}\\&=&{\frac {VJ^{{k}}\left(c_{{1}}e_{{1}}+c_{{2}}e_{{2}}+\cdots +c_{{n}}e_{{n}}\right)}{\|VJ^{{k}}\left(c_{{1}}e_{{1}}+c_{{2}}e_{{2}}+\cdots +c_{{n}}e_{{n}}\right)\|}}\\&=&\left({\frac {\lambda _{{1}}}{|\lambda _{{1}}|}}\right)^{{k}}{\frac {c_{{1}}}{|c_{{1}}|}}{\frac {v_{{1}}+{\frac {1}{c_{{1}}}}V\left({\frac {1}{\lambda _{1}}}J\right)^{{k}}\left(c_{{2}}e_{{2}}+\cdots +c_{{n}}e_{{n}}\right)}{\|v_{{1}}+{\frac {1}{c_{{1}}}}V\left({\frac {1}{\lambda _{1}}}J\right)^{{k}}\left(c_{{2}}e_{{2}}+\cdots +c_{{n}}e_{{n}}\right)\|}}\end{array}}$
The expression above simplifies as $k\rightarrow \infty$
$\left({\frac {1}{\lambda _{{1}}}}J\right)^{{k}}={\begin{bmatrix}[1]&&&&\\&\left({\frac {1}{\lambda _{{1}}}}J_{{2}}\right)^{{k}}&&&\\&&\ddots &\\&&&\left({\frac {1}{\lambda _{{1}}}}J_{{m}}\right)^{{k}}\\\end{bmatrix}}\rightarrow {\begin{bmatrix}1&&&&\\&0&&&\\&&\ddots &\\&&&0\\\end{bmatrix}}$ as $k\rightarrow \infty$ .
The limit follows from the fact that the eigenvalue of ${\frac {1}{\lambda _{{1}}}}J_{{i}}$ is less than 1 in magnitude, so $\left({\frac {1}{\lambda _{{1}}}}J_{{i}}\right)^{{k}}\rightarrow 0$ as $k\rightarrow \infty$
It follows that:
${\frac {1}{c_{{1}}}}V\left({\frac {1}{\lambda _{1}}}J\right)^{{k}}\left(c_{{2}}e_{{2}}+\cdots +c_{{n}}e_{{n}}\right)\rightarrow 0$ as $k\rightarrow \infty$
Using this fact, $b_{{k}}$ can be written in a form that emphasizes its relationship with $v_{{1}}$ when k is large:
${\begin{matrix}b_{{k}}&=&\left({\frac {\lambda _{{1}}}{|\lambda _{{1}}|}}\right)^{{k}}{\frac {c_{{1}}}{|c_{{1}}|}}{\frac {v_{{1}}+{\frac {1}{c_{{1}}}}V\left({\frac {1}{\lambda _{1}}}J\right)^{{k}}\left(c_{{2}}e_{{2}}+\cdots +c_{{n}}e_{{n}}\right)}{\|v_{{1}}+{\frac {1}{c_{{1}}}}V\left({\frac {1}{\lambda _{1}}}J\right)^{{k}}\left(c_{{2}}e_{{2}}+\cdots +c_{{n}}e_{{n}}\right)\|}}&=&e^{{i\phi _{{k}}}}{\frac {c_{{1}}}{|c_{{1}}|}}v_{{1}}+r_{{k}}\end{matrix}}$ where $e^{{i\phi _{{k}}}}=\left(\lambda _{{1}}/|\lambda _{{1}}|\right)^{{k}}$ and $\|r_{{k}}\|\rightarrow 0$ as $k\rightarrow \infty$
The sequence $\left(b_{{k}}\right)$ is bounded, so it contains a convergent subsequence. Note that the eigenvector corresponding to the dominant eigenvalue is only unique up to a scalar, so although the sequence $\left(b_{{k}}\right)$ may not converge, $b_{{k}}$ is nearly an eigenvector of A for large k.

Alternatively, if A is diagonalizable, then the following proof yields the same result
Let λ₁, λ₂, …, λ_m be the m eigenvalues (counted with multiplicity) of A and let v₁, v₂, …, v_m be the corresponding eigenvectors. Suppose that $\lambda _{1}$ is the dominant eigenvalue, so that $|\lambda _{1}|>|\lambda _{j}|$ for $j>1$ .

The initial vector $b_{0}$ can be written:

b_{0}=c_{{1}}v_{{1}}+c_{{2}}v_{{2}}+\cdots +c_{{m}}v_{{m}}.

If $b_{0}$ is chosen randomly (with uniform probability), then c₁ ≠ 0 with probability 1. Now,

{\begin{array}{lcl}A^{{k}}b_{0}&=&c_{{1}}A^{{k}}v_{{1}}+c_{{2}}A^{{k}}v_{{2}}+\cdots +c_{{m}}A^{{k}}v_{{m}}\\&=&c_{{1}}\lambda _{{1}}^{{k}}v_{{1}}+c_{{2}}\lambda _{{2}}^{{k}}v_{{2}}+\cdots +c_{{m}}\lambda _{{m}}^{{k}}v_{{m}}\\&=&c_{{1}}\lambda _{{1}}^{{k}}\left(v_{{1}}+{\frac {c_{{2}}}{c_{{1}}}}\left({\frac {\lambda _{{2}}}{\lambda _{{1}}}}\right)^{{k}}v_{{2}}+\cdots +{\frac {c_{{m}}}{c_{{1}}}}\left({\frac {\lambda _{{m}}}{\lambda _{{1}}}}\right)^{{k}}v_{{m}}\right).\end{array}}

The expression within parentheses converges to $v_{1}$ because $|\lambda _{j}/\lambda _{1}|<1$ for $j>1$ . On the other hand, we have

b_{k}={\frac {A^{k}b_{0}}{\|A^{k}b_{0}\|}}.

Therefore, $b_{k}$ converges to (a multiple of) the eigenvector $v_{1}$ . The convergence is geometric, with ratio

\left|{\frac {\lambda _{2}}{\lambda _{1}}}\right|,

where $\lambda _{2}$ denotes the second dominant eigenvalue. Thus, the method converges slowly if there is an eigenvalue close in magnitude to the dominant eigenvalue.

Applications

Although the power iteration method approximates only one eigenvalue of a matrix, it remains useful for certain computational problems. For instance, Google uses it to calculate the PageRank of documents in their search engine,^[2] and Twitter uses it to show users recommendations of who to follow.^[3] For matrices that are well-conditioned and as sparse as the web matrix, the power iteration method can be more efficient than other methods of finding the dominant eigenvector.

Some of the more advanced eigenvalue algorithms can be understood as variations of the power iteration. For instance, the inverse iteration method applies power iteration to the matrix $A^{-1}$ . Other algorithms look at the whole subspace generated by the vectors $b_{k}$ . This subspace is known as the Krylov subspace. It can be computed by Arnoldi iteration or Lanczos iteration.

References

↑ Richard von Mises and H. Pollaczek-Geiringer, Praktische Verfahren der Gleichungsauflösung, ZAMM - Zeitschrift für Angewandte Mathematik und Mechanik 9, 152-164 (1929).
↑ Ipsen, Ilse, and Rebecca M. Wills (5–8 May 2005). "7th IMACS International Symposium on Iterative Methods in Scientific Computing" (PDF). Fields Institute, Toronto, Canada.
↑ Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Bosagh Zadeh WTF: The who-to-follow system at Twitter, Proceedings of the 22nd international conference on World Wide Web

External links

Power method, part of lecture notes on numerical linear algebra by E. Bruce Pitman, State University of New York.
Module for the Power Method

Numerical linear algebra

Key concepts	Floating point Numerical stability

Problems	Matrix multiplication (algorithms) Matrix decompositions Linear equations Sparse problems

Hardware	CPU cache TLB Cache-oblivious algorithm SIMD Multiprocessing

Software	BLAS Specialized libraries General purpose software

This article is issued from Wikipedia - version of the 10/9/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.