Minimize a cubicly regularized quadratic via Newton root finding. [y, iter, lambda, status] = minimize_cubic_newton(H, g, sigma, options) Inputs: a symmetric matrix H of size n, a nonzero vector g of length n, a positive real sigma and an options structure. The code expects H to be tridiagonal, stored as a sparse matrix. The main output is a vector y of length n, which should minimize f(y) = g'*y + (1/2)*y'*H*y + (1/3)*sigma*norm(y)^3. This is achieved by reducing the problem to a univariate root finding problem, where the unknown is a scalar lambda. This root is computed using a Newton method. Other outputs are iter (the number of Newton iterations completed), lambda (a real scalar, see below) and status. The latter is 0 if the target tolerance was reached, 1 if subsequent iterations induce no significant change, and -1 if the algorithm return because it reached the maximum number of iterations (see the options structure.) Non-negative status values are considered successes. The options structure must contain the following fields (between parentheses are some recommended values): options.verbosity (3): to control how much information this function prints to the command window. Anything below 6 silences the function. options.maxiter_newton (100): maximum number of Newton iterations. options.tol_newton (1e-16): tolerance on the root finding accuracy. See in code for details. The code is based on Section 6 in Cartis, Gould and Toint, "Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results", Mathematical Programming, 2011. https://link.springer.com/article/10.1007/s10107-009-0286-5 Theorem 3.1 in the referenced paper states y is optimal if and only if it there exists a real lambda such that (H + lambda*I)y = -g, lambda = sigma*||y|| and H + lambda*I is psd, where psd means positive semidefinite. The other way around, if we find the corresponding scalar lambda, than we can recover y by solving a linear system (though this system might not have a unique solution in full generality.) Thus, the general strategy is to search for lambda rather than for y. See also: arc arc_lanczos
0001 function [y, iter, lambda, status] = minimize_cubic_newton(H, g, sigma, options) 0002 % Minimize a cubicly regularized quadratic via Newton root finding. 0003 % 0004 % [y, iter, lambda, status] = minimize_cubic_newton(H, g, sigma, options) 0005 % 0006 % Inputs: a symmetric matrix H of size n, a nonzero vector g of length n, 0007 % a positive real sigma and an options structure. The code expects H to 0008 % be tridiagonal, stored as a sparse matrix. 0009 % 0010 % The main output is a vector y of length n, which should minimize 0011 % 0012 % f(y) = g'*y + (1/2)*y'*H*y + (1/3)*sigma*norm(y)^3. 0013 % 0014 % This is achieved by reducing the problem to a univariate root finding 0015 % problem, where the unknown is a scalar lambda. This root is computed 0016 % using a Newton method. 0017 % 0018 % Other outputs are iter (the number of Newton iterations completed), 0019 % lambda (a real scalar, see below) and status. The latter is 0 if the 0020 % target tolerance was reached, 1 if subsequent iterations induce no 0021 % significant change, and -1 if the algorithm return because it reached 0022 % the maximum number of iterations (see the options structure.) 0023 % Non-negative status values are considered successes. 0024 % 0025 % The options structure must contain the following fields (between 0026 % parentheses are some recommended values): 0027 % options.verbosity (3): to control how much information this function 0028 % prints to the command window. Anything below 6 silences the function. 0029 % options.maxiter_newton (100): maximum number of Newton iterations. 0030 % options.tol_newton (1e-16): tolerance on the root finding accuracy. See 0031 % in code for details. 0032 % 0033 % The code is based on Section 6 in 0034 % Cartis, Gould and Toint, "Adaptive cubic regularisation methods for 0035 % unconstrained optimization. Part I: motivation, convergence and numerical 0036 % results", Mathematical Programming, 2011. 0037 % https://link.springer.com/article/10.1007/s10107-009-0286-5 0038 % 0039 % Theorem 3.1 in the referenced paper states y is optimal if and only 0040 % if it there exists a real lambda such that 0041 % 0042 % (H + lambda*I)y = -g, lambda = sigma*||y|| and H + lambda*I is psd, 0043 % 0044 % where psd means positive semidefinite. The other way around, if we 0045 % find the corresponding scalar lambda, than we can recover y by 0046 % solving a linear system (though this system might not have a unique 0047 % solution in full generality.) Thus, the general strategy is to search 0048 % for lambda rather than for y. 0049 % 0050 % See also: arc arc_lanczos 0051 0052 % This file is part of Manopt: www.manopt.org. 0053 % Original authors: May 1, 2018, 0054 % Naman Agarwal, Brian Bullins, Nicolas Boumal and Coralia Cartis. 0055 % Contributors: 0056 % Change log: 0057 0058 n = size(H, 1); 0059 0060 % Pick an initial lambda that is cheap to compute and that surely makes 0061 % the shifted H positive definite. 0062 lambda = norm(H, 1) + 2; 0063 H_shifted = H + lambda*speye(n); 0064 0065 % Compute the smallest eigenvalue of H, as we know the target lambda 0066 % must be at least as large as the negative of that, so that the 0067 % shifted H will be positive semidefinite. 0068 % 0069 % Since H ought to be sparse and tridiagonal, and since we only need 0070 % its smallest eigenvalue, this computation could be sped up 0071 % significantly. It does not appear to be a bottleneck, and eig is 0072 % simple and reliable, so we keep this for now. 0073 lambda_min = min(eig(H)); 0074 left_barrier = max(0, -lambda_min); 0075 0076 % Counter 'iter' holds the number of fully executed Newton iterations. 0077 iter = 0; 0078 while true 0079 0080 if iter >= options.maxiter_newton 0081 % Iterations exceeded maximum number allowed. 0082 status = -1; 0083 return; 0084 end 0085 0086 % If lambda has the correct value and the shifted H is positive 0087 % definite, then this y is a minimizer. 0088 y = -(H_shifted\g); 0089 ynorm = norm(y); 0090 0091 % If the following quantity is zero, we have found a solution. 0092 phi = 1/ynorm - sigma/lambda; 0093 0094 % Check if it is close enough to zero to stop. 0095 if abs(phi) <= options.tol_newton*ynorm 0096 status = 0; 0097 return; 0098 end 0099 psi = ynorm^2; 0100 0101 % TODO: clarify this part of the code (see referenced paper). 0102 % The following is a Newton type of step on the equation 0103 % sigma/lambda = 1/sqrt(psi(lambda_prev)) ... 0104 % - (lambda - lambda_prev)((psi'(lambda_prev))/2(psi)^1.5) 0105 delta_y = -(H_shifted\y); 0106 psi_prime = 2*(y'*delta_y); 0107 p0 = 2*sigma*(psi^(1.5)); 0108 p1 = -2*psi - lambda*psi_prime; 0109 p2 = psi_prime; 0110 r = roots([p2 p1 p0]); 0111 del_lambda = max(r) - lambda; 0112 iter = iter + 1; 0113 0114 % If the Newton step would bring us left of the left barrier, jump 0115 % instead to the midpoint between the left barrier and the current 0116 % lambda. 0117 if lambda + del_lambda <= left_barrier 0118 del_lambda = -.5*(lambda - left_barrier); 0119 end 0120 0121 % If the step is so small that it numerically does not make a 0122 % difference when added to the current lambda, we stop. 0123 if abs(del_lambda) <= eps(lambda) 0124 status = 1; 0125 return; 0126 end 0127 0128 % Update lambda 0129 H_shifted = H_shifted + del_lambda*speye(n); 0130 lambda = lambda + del_lambda; 0131 0132 0133 if options.verbosity >= 6 0134 fprintf(['lambda %.12e, ||y|| %.12e, lambda/sigma %.12e, ' ... 0135 'phi %.12e\n\n'], lambda, ynorm, lambda / sigma, phi); 0136 end 0137 0138 end 0139 0140 end