Edit on GitHub

Newton's (Method's) Failure

Finding zeros of any old function is a common task, and using Newton's method is one of the best tools for carrying out this task. I've even written an old post that used this method.

However, Newton's method loses some of it's luster around repeated roots. Consider the iteration

$x_{n + 1} = x_n - \frac{f(x_n)}{f'(x_n)}$

On inspection, points where $f'(x_n) = 0$ seem to be problematic. However, when these points are also roots, the division $0 / 0$ may no longer be a problem. The case where

$f(x_n) = f'(x_n) = 0$

is exactly the case of a repeated root.

For a practical example, consider $f(x) = (x - 1)^2$ which has corresponding iteration function

$g(x) = x - \frac{(x - 1)^2}{2(x - 1)} = \frac{x + 1}{2}.$

Starting with $x_0 = 1 + 2^{-0} = 2,$ in exact arithmetic (i.e. no rounding) we'll always have $x_n = 1 + 2^{-n}$ since

$\frac{1 + x_n}{2} = \frac{2 + 2^{-n}}{2} = 1 + 2^{-n-1}.$

This sequence converges to $1$ and the error halves every term. However, in double precision, the sequence stops once $n = 53$: at that point $1 \oplus 2^{-53} = 0$.

However, if we told the computer that $f(x) = (x - 1)^2,$ there would be no reason to use Newton's method, the roots are clearly $1$ and $1$. Instead, the "data" the computer is given are the coefficients in $x^2 - 2x + 1$ and each term is computed in floating point

$x_{n + 1} = x_n \ominus \left(N_n \oslash D_n\right).$

where the numerator and denominator are given (with the help of Horner's method) by

$N_n = \left((x_n \ominus 2) \otimes x_n\right) \oplus 1, \quad D_n = \left(2 \otimes x_n\right) \ominus 2.$

Each floating point operation $(\otimes, \oslash, \oplus, \ominus)$ rounds the "exact" result to the nearest floating point number. For $x_1, \, \ldots, \, x_{27},$ there is no rounding needed and we get the same values that we got in exact arithmetic.

The real problem happens when $n = 27$. At that point

$(x_n \ominus 2) \otimes x_n = \left(-\left(1 - 2^{-27}\right)\right) \otimes \left(1 + 2^{-27}\right)$

and the exact result $1 - 2^{-54}$ needs to be rounded into floating point:

$N_{27} = \left(-(1 \ominus 2^{-54})\right) \oplus 1 = \left(-1\right) \oplus 1 = 0$

which forces $x_{28} = x_{27} \ominus 0$. After this point, the iteration stalls and every term after is equal to $x_{27}$.

Thus when the algorithm terminates on a computer (since two consecutive terms are identical), the actual error is $2^{-27}$. This error is much too large! Using the standard floating point type (double), the machine precision is $\varepsilon = 2^{-52}$ and our error is approximately $\sqrt{\varepsilon}$. In other words, we expected 52 bits of precision and only get 27!

In general, a root repeated $m$ times — i.e. with multiplicity $m$ — will only be accurate to $\frac{52}{m}$ bits, a very large penalty!

We can see this phenomenon in action, we define the following function to do one Newton step using Horner's method:

def horner_newton(coeffs, val):
degree = len(coeffs) - 1
deriv_factor = float(degree)
fx = fpx = 0.0
for i in range(degree):
fx = val * fx + coeffs[i]
fpx = val * fpx + deriv_factor * coeffs[i]
deriv_factor -= 1.0
# Add the last term to fx.
fx = val * fx + coeffs[-1]
return val - fx / fpx

Applying it to $(x^2 - 2)^2 = x^4 - 4x^2 + 4$ — which has two double roots — we get

>>> values = [1.0]
>>> coeffs = [1, 0, -4, 0, 4]
>>> next_val = horner_newton(coeffs, values[-1])
>>> while next_val not in values:
...     values.append(next_val)
...     next_val = horner_newton(coeffs, values[-1])
...
>>> len(values)
28
>>> values[-1]
1.4142135634821
>>> math.log(abs(values[-1] - math.sqrt(2)), 2)
-29.74808712981571

i.e. the error is around $2^{-29} \approx \sqrt{\varepsilon}$.

Trying out with a triple root $(x - 1)^3 = x^3 - 3 x^2 + 3x - 1$ we get

>>> values = [2.0]
>>> coeffs = [1, -3, 3, -1]
>>> next_val = horner_newton(coeffs, values[-1])
>>> while next_val not in values:
...     values.append(next_val)
...     next_val = horner_newton(coeffs, values[-1])
...
>>> len(values)
31
>>> values[-1]
1.0000046685597561
>>> math.log(abs(values[-1] - 1), 2)
-17.70859102007032

with an error around $2^{-17} \approx \sqrt[3]{\varepsilon}$ as predicted.