Home > manopt > solvers > arc > arc_gradient_descent.m

arc_gradient_descent

PURPOSE ^

Subproblem solver for ARC based on gradient descent.

SYNOPSIS ^

function [eta, Heta, hesscalls, stop_str, stats] = arc_gradient_descent(problem, x, grad, gradnorm, sigma, options, storedb, key)

DESCRIPTION ^

 Subproblem solver for ARC based on gradient descent.

 [eta, Heta, hesscalls, stop_str, stats] = 
     arc_gradient_descent(problem, x, grad, gradnorm, sigma, options, storedb, key)

 This routine approximately solves the following problem:

   min_{eta in T_x M}  m(eta),  where

       m(eta) = <eta, g> + .5 <eta, H[eta]> + (sigma/3) ||eta||^3

 where eta is a tangent vector at x on the manifold given by problem.M,
 g = grad is a tangent vector at x, H[eta] is the result of applying the
 Hessian of the problem at x along eta and the inner product and norm
 are those from the Riemannian structure on the tangent space T_x M.

 The solve is approximate in the sense that the returned eta only ought
 to satisfy the following conditions:

   ||gradient of m at eta|| <= theta*||eta||^2   and   m(eta) <= m(0),

 where theta is specified in options.theta (see below for default value.)
 Since the gradient of the model at 0 is g, if it is zero, then eta = 0
 is returned. This is the only scenario where eta = 0 is returned.

 Numerical errors can perturb the described expected behavior.

 Inputs:
     problem: Manopt optimization problem structure
     x: point on the manifold problem.M
     grad: gradient of the cost function of the problem at x
     gradnorm: norm of the gradient, often available to the caller
     sigma: cubic regularization parameter (positive scalar)
     options: structure containing options for the subproblem solver
     storedb, key: caching data for problem at x

 Options specific to this subproblem solver:
   theta (0.5)
     Stopping criterion parameter for subproblem solver: the gradient of
     the model at the returned step should have norm no more than theta
     times the squared norm of the step.
   maxinner (100)
     Maximum number of iterations of the gradient descent algorithm.

 Outputs:
     eta: approximate solution to the cubic regularized subproblem at x
     Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it
           is often naturally available to the subproblem solver at the
           end of execution, so that it may be cheaper to return it here.
     hesscalls: number of Hessian calls during execution
     stop_str: string describing why the subsolver stopped
     stats: a structure specifying some statistics about inner work - 
            we record the model cost value and model gradient norm at each
            inner iteration.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [eta, Heta, hesscalls, stop_str, stats] = arc_gradient_descent(problem, x, grad, gradnorm, sigma, options, storedb, key)
0002 % Subproblem solver for ARC based on gradient descent.
0003 %
0004 % [eta, Heta, hesscalls, stop_str, stats] =
0005 %     arc_gradient_descent(problem, x, grad, gradnorm, sigma, options, storedb, key)
0006 %
0007 % This routine approximately solves the following problem:
0008 %
0009 %   min_{eta in T_x M}  m(eta),  where
0010 %
0011 %       m(eta) = <eta, g> + .5 <eta, H[eta]> + (sigma/3) ||eta||^3
0012 %
0013 % where eta is a tangent vector at x on the manifold given by problem.M,
0014 % g = grad is a tangent vector at x, H[eta] is the result of applying the
0015 % Hessian of the problem at x along eta and the inner product and norm
0016 % are those from the Riemannian structure on the tangent space T_x M.
0017 %
0018 % The solve is approximate in the sense that the returned eta only ought
0019 % to satisfy the following conditions:
0020 %
0021 %   ||gradient of m at eta|| <= theta*||eta||^2   and   m(eta) <= m(0),
0022 %
0023 % where theta is specified in options.theta (see below for default value.)
0024 % Since the gradient of the model at 0 is g, if it is zero, then eta = 0
0025 % is returned. This is the only scenario where eta = 0 is returned.
0026 %
0027 % Numerical errors can perturb the described expected behavior.
0028 %
0029 % Inputs:
0030 %     problem: Manopt optimization problem structure
0031 %     x: point on the manifold problem.M
0032 %     grad: gradient of the cost function of the problem at x
0033 %     gradnorm: norm of the gradient, often available to the caller
0034 %     sigma: cubic regularization parameter (positive scalar)
0035 %     options: structure containing options for the subproblem solver
0036 %     storedb, key: caching data for problem at x
0037 %
0038 % Options specific to this subproblem solver:
0039 %   theta (0.5)
0040 %     Stopping criterion parameter for subproblem solver: the gradient of
0041 %     the model at the returned step should have norm no more than theta
0042 %     times the squared norm of the step.
0043 %   maxinner (100)
0044 %     Maximum number of iterations of the gradient descent algorithm.
0045 %
0046 % Outputs:
0047 %     eta: approximate solution to the cubic regularized subproblem at x
0048 %     Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it
0049 %           is often naturally available to the subproblem solver at the
0050 %           end of execution, so that it may be cheaper to return it here.
0051 %     hesscalls: number of Hessian calls during execution
0052 %     stop_str: string describing why the subsolver stopped
0053 %     stats: a structure specifying some statistics about inner work -
0054 %            we record the model cost value and model gradient norm at each
0055 %            inner iteration.
0056 
0057 % This file is part of Manopt: www.manopt.org.
0058 % Original authors: May 2, 2019,
0059 %    Bryan Zhu, Nicolas Boumal.
0060 % Contributors:
0061 % Change log:
0062 %
0063 %   Aug. 19, 2019 (NB):
0064 %       Option maxiter_gradient renamed to maxinner to match trustregions.
0065 
0066     % Some shortcuts
0067     M = problem.M;
0068     inner = @(u, v) M.inner(x, u, v);
0069     rnorm = @(u) M.norm(x, u);
0070     tangent = @(u) problem.M.tangent(x, u);
0071     Hess = @(u) getHessian(problem, x, u, storedb, key);
0072     
0073     % Counter for Hessian calls issued
0074     hesscalls = 0;
0075     
0076     % If the gradient has norm zero, return a zero step
0077     if gradnorm == 0
0078         eta = M.zerovec(x);
0079         Heta = eta;
0080         stop_str = 'Cost gradient is zero';
0081         stats = struct('gradnorms', 0, 'func_values', 0);
0082         return;
0083     end
0084     
0085     % Set local defaults here
0086     localdefaults.theta = 0.5;
0087     localdefaults.maxinner = 100;
0088     
0089     % Merge local defaults with user options, if any
0090     if ~exist('options', 'var') || isempty(options)
0091         options = struct();
0092     end
0093     options = mergeOptions(localdefaults, options);
0094     
0095     % Calculate the Cauchy point as our initial step
0096     hess_grad = Hess(grad);
0097     hesscalls = hesscalls + 1;
0098     temp = inner(grad, hess_grad) / (2 * sigma * gradnorm * gradnorm);    
0099     R_c = -temp + sqrt(temp * temp + gradnorm / sigma);
0100     eta = M.lincomb(x, -R_c / (1 * gradnorm), grad);
0101     Heta = M.lincomb(x, -R_c / (1 * gradnorm), hess_grad);
0102     
0103     % Main gradient descent iteration
0104     gradnorms = zeros(options.maxinner, 1);
0105     func_values = zeros(options.maxinner, 1);
0106     gradnorm_reached = false;
0107     j = 1;
0108     while j < options.maxinner
0109         % Calculate the gradient of the model
0110         eta_norm = rnorm(eta);
0111         neg_mgrad = M.lincomb(x, 1, Heta, 1, grad);
0112         neg_mgrad = M.lincomb(x, -1, neg_mgrad, -sigma * eta_norm, eta);
0113         neg_mgrad = tangent(neg_mgrad);
0114         
0115         % Compute some statistics
0116         gradnorms(j) = rnorm(neg_mgrad);
0117         func_values(j) = inner(grad, eta) + 0.5 * inner(eta, Heta) + (sigma/3) * eta_norm^3;
0118 
0119         % Check termination condition
0120         if rnorm(neg_mgrad) <= options.theta * eta_norm^2
0121             stop_str = 'Model grad norm condition satisfied';
0122             gradnorm_reached = true;
0123             break;
0124         end
0125         
0126         % Find the optimal step in the negative direction of the gradient
0127         Hnmgrad = Hess(neg_mgrad);
0128         hesscalls = hesscalls + 1;
0129         step = solve_along_line(M, x, eta, neg_mgrad, grad, Hnmgrad, sigma);
0130         if step == 0
0131             stop_str = 'Failed optimal increase';
0132             gradnorm_reached = true;
0133             break;
0134         end
0135         eta = M.lincomb(x, 1, eta, step, neg_mgrad);
0136         Heta = M.lincomb(x, 1, Heta, step, Hnmgrad);
0137         j = j + 1;
0138     end
0139     
0140     % Check why we stopped iterating
0141     if ~gradnorm_reached
0142         stop_str = sprintf(['Reached max number of gradient descent iterations ' ...
0143                '(options.maxinner = %d)'], options.maxinner);
0144         j = j - 1;
0145     end
0146     
0147     % Return the point we ended on
0148     eta = tangent(eta);
0149     stats = struct('gradnorms', gradnorms(1:j), 'func_values', func_values(1:j));    
0150     if options.verbosity >= 4
0151         fprintf('\n');
0152     end
0153         
0154 end

Generated on Fri 30-Sep-2022 13:18:25 by m2html © 2005