Subproblem solver for ARC based on gradient descent. [eta, Heta, hesscalls, stop_str, stats] = arc_gradient_descent(problem, x, grad, gradnorm, sigma, options, storedb, key) This routine approximately solves the following problem: min_{eta in T_x M} m(eta), where m(eta) = <eta, g> + .5 <eta, H[eta]> + (sigma/3) ||eta||^3 where eta is a tangent vector at x on the manifold given by problem.M, g = grad is a tangent vector at x, H[eta] is the result of applying the Hessian of the problem at x along eta and the inner product and norm are those from the Riemannian structure on the tangent space T_x M. The solve is approximate in the sense that the returned eta only ought to satisfy the following conditions: ||gradient of m at eta|| <= theta*||eta||^2 and m(eta) <= m(0), where theta is specified in options.theta (see below for default value.) Since the gradient of the model at 0 is g, if it is zero, then eta = 0 is returned. This is the only scenario where eta = 0 is returned. Numerical errors can perturb the described expected behavior. Inputs: problem: Manopt optimization problem structure x: point on the manifold problem.M grad: gradient of the cost function of the problem at x gradnorm: norm of the gradient, often available to the caller sigma: cubic regularization parameter (positive scalar) options: structure containing options for the subproblem solver storedb, key: caching data for problem at x Options specific to this subproblem solver: theta (0.5) Stopping criterion parameter for subproblem solver: the gradient of the model at the returned step should have norm no more than theta times the squared norm of the step. maxinner (100) Maximum number of iterations of the gradient descent algorithm. Outputs: eta: approximate solution to the cubic regularized subproblem at x Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it is often naturally available to the subproblem solver at the end of execution, so that it may be cheaper to return it here. hesscalls: number of Hessian calls during execution stop_str: string describing why the subsolver stopped stats: a structure specifying some statistics about inner work - we record the model cost value and model gradient norm at each inner iteration.
0001 function [eta, Heta, hesscalls, stop_str, stats] = arc_gradient_descent(problem, x, grad, gradnorm, sigma, options, storedb, key) 0002 % Subproblem solver for ARC based on gradient descent. 0003 % 0004 % [eta, Heta, hesscalls, stop_str, stats] = 0005 % arc_gradient_descent(problem, x, grad, gradnorm, sigma, options, storedb, key) 0006 % 0007 % This routine approximately solves the following problem: 0008 % 0009 % min_{eta in T_x M} m(eta), where 0010 % 0011 % m(eta) = <eta, g> + .5 <eta, H[eta]> + (sigma/3) ||eta||^3 0012 % 0013 % where eta is a tangent vector at x on the manifold given by problem.M, 0014 % g = grad is a tangent vector at x, H[eta] is the result of applying the 0015 % Hessian of the problem at x along eta and the inner product and norm 0016 % are those from the Riemannian structure on the tangent space T_x M. 0017 % 0018 % The solve is approximate in the sense that the returned eta only ought 0019 % to satisfy the following conditions: 0020 % 0021 % ||gradient of m at eta|| <= theta*||eta||^2 and m(eta) <= m(0), 0022 % 0023 % where theta is specified in options.theta (see below for default value.) 0024 % Since the gradient of the model at 0 is g, if it is zero, then eta = 0 0025 % is returned. This is the only scenario where eta = 0 is returned. 0026 % 0027 % Numerical errors can perturb the described expected behavior. 0028 % 0029 % Inputs: 0030 % problem: Manopt optimization problem structure 0031 % x: point on the manifold problem.M 0032 % grad: gradient of the cost function of the problem at x 0033 % gradnorm: norm of the gradient, often available to the caller 0034 % sigma: cubic regularization parameter (positive scalar) 0035 % options: structure containing options for the subproblem solver 0036 % storedb, key: caching data for problem at x 0037 % 0038 % Options specific to this subproblem solver: 0039 % theta (0.5) 0040 % Stopping criterion parameter for subproblem solver: the gradient of 0041 % the model at the returned step should have norm no more than theta 0042 % times the squared norm of the step. 0043 % maxinner (100) 0044 % Maximum number of iterations of the gradient descent algorithm. 0045 % 0046 % Outputs: 0047 % eta: approximate solution to the cubic regularized subproblem at x 0048 % Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it 0049 % is often naturally available to the subproblem solver at the 0050 % end of execution, so that it may be cheaper to return it here. 0051 % hesscalls: number of Hessian calls during execution 0052 % stop_str: string describing why the subsolver stopped 0053 % stats: a structure specifying some statistics about inner work - 0054 % we record the model cost value and model gradient norm at each 0055 % inner iteration. 0056 0057 % This file is part of Manopt: www.manopt.org. 0058 % Original authors: May 2, 2019, 0059 % Bryan Zhu, Nicolas Boumal. 0060 % Contributors: 0061 % Change log: 0062 % 0063 % Aug. 19, 2019 (NB): 0064 % Option maxiter_gradient renamed to maxinner to match trustregions. 0065 0066 % Some shortcuts 0067 M = problem.M; 0068 inner = @(u, v) M.inner(x, u, v); 0069 rnorm = @(u) M.norm(x, u); 0070 tangent = @(u) problem.M.tangent(x, u); 0071 Hess = @(u) getHessian(problem, x, u, storedb, key); 0072 0073 % Counter for Hessian calls issued 0074 hesscalls = 0; 0075 0076 % If the gradient has norm zero, return a zero step 0077 if gradnorm == 0 0078 eta = M.zerovec(x); 0079 Heta = eta; 0080 stop_str = 'Cost gradient is zero'; 0081 stats = struct('gradnorms', 0, 'func_values', 0); 0082 return; 0083 end 0084 0085 % Set local defaults here 0086 localdefaults.theta = 0.5; 0087 localdefaults.maxinner = 100; 0088 0089 % Merge local defaults with user options, if any 0090 if ~exist('options', 'var') || isempty(options) 0091 options = struct(); 0092 end 0093 options = mergeOptions(localdefaults, options); 0094 0095 % Calculate the Cauchy point as our initial step 0096 hess_grad = Hess(grad); 0097 hesscalls = hesscalls + 1; 0098 temp = inner(grad, hess_grad) / (2 * sigma * gradnorm * gradnorm); 0099 R_c = -temp + sqrt(temp * temp + gradnorm / sigma); 0100 eta = M.lincomb(x, -R_c / (1 * gradnorm), grad); 0101 Heta = M.lincomb(x, -R_c / (1 * gradnorm), hess_grad); 0102 0103 % Main gradient descent iteration 0104 gradnorms = zeros(options.maxinner, 1); 0105 func_values = zeros(options.maxinner, 1); 0106 gradnorm_reached = false; 0107 j = 1; 0108 while j < options.maxinner 0109 % Calculate the gradient of the model 0110 eta_norm = rnorm(eta); 0111 neg_mgrad = M.lincomb(x, 1, Heta, 1, grad); 0112 neg_mgrad = M.lincomb(x, -1, neg_mgrad, -sigma * eta_norm, eta); 0113 neg_mgrad = tangent(neg_mgrad); 0114 0115 % Compute some statistics 0116 gradnorms(j) = rnorm(neg_mgrad); 0117 func_values(j) = inner(grad, eta) + 0.5 * inner(eta, Heta) + (sigma/3) * eta_norm^3; 0118 0119 % Check termination condition 0120 if rnorm(neg_mgrad) <= options.theta * eta_norm^2 0121 stop_str = 'Model grad norm condition satisfied'; 0122 gradnorm_reached = true; 0123 break; 0124 end 0125 0126 % Find the optimal step in the negative direction of the gradient 0127 Hnmgrad = Hess(neg_mgrad); 0128 hesscalls = hesscalls + 1; 0129 step = solve_along_line(M, x, eta, neg_mgrad, grad, Hnmgrad, sigma); 0130 if step == 0 0131 stop_str = 'Failed optimal increase'; 0132 gradnorm_reached = true; 0133 break; 0134 end 0135 eta = M.lincomb(x, 1, eta, step, neg_mgrad); 0136 Heta = M.lincomb(x, 1, Heta, step, Hnmgrad); 0137 j = j + 1; 0138 end 0139 0140 % Check why we stopped iterating 0141 if ~gradnorm_reached 0142 stop_str = sprintf(['Reached max number of gradient descent iterations ' ... 0143 '(options.maxinner = %d)'], options.maxinner); 0144 j = j - 1; 0145 end 0146 0147 % Return the point we ended on 0148 eta = tangent(eta); 0149 stats = struct('gradnorms', gradnorms(1:j), 'func_values', func_values(1:j)); 0150 if options.verbosity >= 4 0151 fprintf('\n'); 0152 end 0153 0154 end