Newton's Method for Function Minimization ----------------------------------------- The common Newton iteration for finding a zero of a function f:R->R is given by the formula x_new:=x_old-f(x_old)/f'(x_old) Defining the Newton step h=x_new-x_old this can be written as f'(x_old)*h=-f(x_old) The Newton method can be generalized to a function f:R^n->R^n. The corresponding formula will be J(x_old)*h=-f(x_old) where x_old, h and f(x_old) are n-vectors and the Jacobian (Jacobi-Matrix) J(x_old) of f is an nxn-matrix. Recall that the Jacobian of a function f:R^n->R^m is given by / \ | partial_f_1(x)/partial_x_1 ... partial_f_1(x)/partial_x_n | J(x)=| | | partial_f_m(x)/partial_x_1 ... partial_f_m(x)/partial_x_n | \ / Notice that in the multidimensional case, a linear equation system has to be solved in each iteration step. However in our context we use Newton's method not to find a zero, but to minimize the function f. This is achieved by applying the above iteration to the derivative f', so the unidimensional formula will be x_new:=x_old-f'(x_old)/f''(x_old) Here is another way to come to this formula: Recall that the extremum of a parabola a*x^2+b*x+c is at x=-b/(2*a) . Now approximate the function f in x by a parabola (given by the truncated series expansion) f(x+h)=f(x)+h*f'(x)+h^2*f''(x)/2 The extremum is at h=-f'(x)/f''(x) Again, the unidimensional formula can be generalized to a function f:R^n->R. The corresponding formula will be H(x_old)*h=-f'(x_old) where H is the Hessian (Hesse-Matrix) of f and f' is the gradient of f. Here x_old, h and f'(x_old) are n-vectors and H(x_old) is an nxn-matrix. Recall that the Hessian of a function f:R^n->R is given by / \ | partial^2_f(x)/(partial_x_1*partial_x_1) ... partial^2_f(x)/(partial_x_1*partial_x_n) | H(x)=| | | partial^2_f(x)/(partial_x_n*partial_x_1) ... partial^2_f(x)/(partial_x_n*partial_x_n) | \ / and that it is equal to the Jacobian of the gradient of f.