Solution of a linear system

Solution of a linear system[edit]

The steepest descent algorithm applied to the Wiener filter^[11]

Gradient descent can be used to solve a system of linear equations

A\mathbf {x} -\mathbf {b} =0

reformulated as a quadratic minimization problem. If the system matrix $�$ is real symmetric and positive-definite, an objective function is defined as the quadratic function, with minimization of

F(\mathbf {x} )=\mathbf {x} ^{T}A\mathbf {x} -2\mathbf {x} ^{T}\mathbf {b} ,

so that

\nabla F(\mathbf {x} )=2(A\mathbf {x} -\mathbf {b} ).

For a general real matrix $�$ , linear least squares define

F(\mathbf {x} )=\left\|A\mathbf {x} -\mathbf {b} \right\|^{2}.

In traditional linear least squares for real $�$ and $\mathbf {b}$ the Euclidean norm is used, in which case

\nabla F(\mathbf {x} )=2A^{T}(A\mathbf {x} -\mathbf {b} ).

The line search minimization, finding the locally optimal step size $\gamma$ on every iteration, can be performed analytically for quadratic functions, and explicit formulas for the locally optimal $\gamma$ are known.^[5]^[12]

For example, for real symmetric and positive-definite matrix $�$ , a simple algorithm can be as follows,^[5]

{\begin{aligned}&{\text{repeat in the loop:}}\\&\qquad \mathbf {r} :=\mathbf {b} -\mathbf {Ax} \\&\qquad \gamma :={\mathbf {r} ^{\mathsf {T}}\mathbf {r} }/{\mathbf {r} ^{\mathsf {T}}\mathbf {Ar} }\\&\qquad \mathbf {x} :=\mathbf {x} +\gamma \mathbf {r} \\&\qquad {\hbox{if }}\mathbf {r} ^{\mathsf {T}}\mathbf {r} {\text{ is sufficiently small, then exit loop}}\\&{\text{end repeat loop}}\\&{\text{return }}\mathbf {x} {\text{ as the result}}\end{aligned}}

To avoid multiplying by $�$ twice per iteration, we note that $\mathbf {x} :=\mathbf {x} +\gamma \mathbf {r}$ implies $\mathbf {r} :=\mathbf {r} -\gamma \mathbf {Ar}$ , which gives the traditional algorithm,^[13]

{\begin{aligned}&\mathbf {r} :=\mathbf {b} -\mathbf {Ax} \\&{\text{repeat in the loop:}}\\&\qquad \gamma :={\mathbf {r} ^{\mathsf {T}}\mathbf {r} }/{\mathbf {r} ^{\mathsf {T}}\mathbf {Ar} }\\&\qquad \mathbf {x} :=\mathbf {x} +\gamma \mathbf {r} \\&\qquad {\hbox{if }}\mathbf {r} ^{\mathsf {T}}\mathbf {r} {\text{ is sufficiently small, then exit loop}}\\&\qquad \mathbf {r} :=\mathbf {r} -\gamma \mathbf {Ar} \\&{\text{end repeat loop}}\\&{\text{return }}\mathbf {x} {\text{ as the result}}\end{aligned}}

The method is rarely used for solving linear equations, with the conjugate gradient method being one of the most popular alternatives. The number of gradient descent iterations is commonly proportional to the spectral condition number $\kappa (A)$ of the system matrix $�$ (the ratio of the maximum to minimum eigenvalues of $A^{T}A$ ), while the convergence of conjugate gradient method is typically determined by a square root of the condition number, i.e., is much faster. Both methods can benefit from preconditioning, where gradient descent may require less assumptions on the preconditioner.^[13]

Solution of a non-linear system[edit]

Gradient descent can also be used to solve a system of nonlinear equations. Below is an example that shows how to use the gradient descent to solve for three unknown variables, x₁, x₂, and x₃. This example shows one iteration of the gradient descent.

Consider the nonlinear system of equations