Home > manopt > solvers > conjugategradient > conjugategradient.m

conjugategradient

PURPOSE ^

Conjugate gradient minimization algorithm for Manopt.

SYNOPSIS ^

function [x, cost, info, options] = conjugategradient(problem, x, options)

DESCRIPTION ^

 Conjugate gradient minimization algorithm for Manopt.

 function [x, cost, info, options] = conjugategradient(problem)
 function [x, cost, info, options] = conjugategradient(problem, x0)
 function [x, cost, info, options] = conjugategradient(problem, x0, options)
 function [x, cost, info, options] = conjugategradient(problem, [], options)

 Apply the conjugate gradient minimization algorithm to the problem
 defined in the problem structure, starting at x0 if it is provided
 (otherwise, at a random point on the manifold). To specify options whilst
 not specifying an initial guess, give x0 as [] (the empty matrix).

 The outputs x and cost are the best reached point on the manifold and its
 cost. The struct-array info contains information about the iterations:
   iter : the iteration number (0 for the initial guess)
   cost : cost value
   time : elapsed time in seconds
   gradnorm : Riemannian norm of the gradient
   stepsize : norm of the last tangent vector retracted
   beta : value of the beta parameter (see options.beta_type)
   linesearch : information logged by options.linesearch
   And possibly additional information logged by options.statsfun.
 For example, type [info.gradnorm] to obtain a vector of the successive
 gradient norms reached.

 The options structure is used to overwrite the default values. All
 options have a default value and are hence optional. To force an option
 value, pass an options structure with a field options.optionname, where
 optionname is one of the following and the default value is indicated
 between parentheses:

   tolgradnorm (1e-6)
       The algorithm terminates if the norm of the gradient drops below this.
   maxiter (1000)
       The algorithm terminates if maxiter iterations have been executed.
   maxtime (Inf)
       The algorithm terminates if maxtime seconds elapsed.
   minstepsize (1e-10)
       The algorithm terminates if the linesearch returns a displacement
       vector (to be retracted) smaller in norm than this value.
   beta_type ('H-S')
       Conjugate gradient beta rule used to construct the new search
       direction, based on a linear combination of the previous search
       direction and the new (preconditioned) gradient. Possible values
       for this parameter are:
           'S-D', 'steep' for beta = 0 (preconditioned steepest descent)
           'F-R' for Fletcher-Reeves's rule
           'P-R' for Polak-Ribiere's modified rule
           'H-S' for Hestenes-Stiefel's modified rule
           'H-Z' for Hager-Zhang's modified rule
       See Hager and Zhang 2006, "A survey of nonlinear conjugate gradient
       methods" for a description of these rules in the Euclidean case and
       for an explanation of how to adapt them to the preconditioned case.
       The adaption to the Riemannian case is straightforward: see in code
       for details. Modified rules take the max between 0 and the computed
       beta value, which provides automatic restart, except for H-Z which
       uses a different modification.
   orth_value (Inf)
       Following Powell's restart strategy (Math. prog. 1977), restart CG
       (that is, make a -preconditioned- gradient step) if two successive
       -preconditioned- gradients are "too" parallel. See for example
       Hager and Zhang 2006, "A survey of nonlinear conjugate gradient
       methods", page 12. An infinite value disables this strategy. See in
       code formula for the specific criterion used.
   linesearch (@linesearch_adaptive or @linesearch_hint)
       Function handle to a line search function. The options structure is
       passed to the line search too, so you can pass it parameters. See
       each line search's documentation for info. Another available line
       search in manopt is @linesearch, in /manopt/linesearch/linesearch.m
       If the problem structure includes a line search hint, then the
       default line search used is @linesearch_hint.
   statsfun (none)
       Function handle to a function that will be called after each
       iteration to provide the opportunity to log additional statistics.
       They will be returned in the info struct. See the generic Manopt
       documentation about solvers for further information.
   stopfun (none)
       Function handle to a function that will be called at each iteration
       to provide the opportunity to specify additional stopping criteria.
       See the generic Manopt documentation about solvers for further
       information.
   verbosity (3)
       Integer number used to tune the amount of output the algorithm
       generates during execution (mostly as text in the command window).
       The higher, the more output. 0 means silent.
   storedepth (2)
       Maximum number of different points x of the manifold for which a
       store structure will be kept in memory in the storedb. If the
       caching features of Manopt are not used, this is irrelevant. For
       the CG algorithm, a store depth of 2 should always be sufficient.


 In most of the examples bundled with the toolbox (see link below), the
 solver can be replaced by the present one if need be.

 See also: steepestdescent trustregions manopt/solvers/linesearch manopt/examples

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [x, cost, info, options] = conjugategradient(problem, x, options)
0002 % Conjugate gradient minimization algorithm for Manopt.
0003 %
0004 % function [x, cost, info, options] = conjugategradient(problem)
0005 % function [x, cost, info, options] = conjugategradient(problem, x0)
0006 % function [x, cost, info, options] = conjugategradient(problem, x0, options)
0007 % function [x, cost, info, options] = conjugategradient(problem, [], options)
0008 %
0009 % Apply the conjugate gradient minimization algorithm to the problem
0010 % defined in the problem structure, starting at x0 if it is provided
0011 % (otherwise, at a random point on the manifold). To specify options whilst
0012 % not specifying an initial guess, give x0 as [] (the empty matrix).
0013 %
0014 % The outputs x and cost are the best reached point on the manifold and its
0015 % cost. The struct-array info contains information about the iterations:
0016 %   iter : the iteration number (0 for the initial guess)
0017 %   cost : cost value
0018 %   time : elapsed time in seconds
0019 %   gradnorm : Riemannian norm of the gradient
0020 %   stepsize : norm of the last tangent vector retracted
0021 %   beta : value of the beta parameter (see options.beta_type)
0022 %   linesearch : information logged by options.linesearch
0023 %   And possibly additional information logged by options.statsfun.
0024 % For example, type [info.gradnorm] to obtain a vector of the successive
0025 % gradient norms reached.
0026 %
0027 % The options structure is used to overwrite the default values. All
0028 % options have a default value and are hence optional. To force an option
0029 % value, pass an options structure with a field options.optionname, where
0030 % optionname is one of the following and the default value is indicated
0031 % between parentheses:
0032 %
0033 %   tolgradnorm (1e-6)
0034 %       The algorithm terminates if the norm of the gradient drops below this.
0035 %   maxiter (1000)
0036 %       The algorithm terminates if maxiter iterations have been executed.
0037 %   maxtime (Inf)
0038 %       The algorithm terminates if maxtime seconds elapsed.
0039 %   minstepsize (1e-10)
0040 %       The algorithm terminates if the linesearch returns a displacement
0041 %       vector (to be retracted) smaller in norm than this value.
0042 %   beta_type ('H-S')
0043 %       Conjugate gradient beta rule used to construct the new search
0044 %       direction, based on a linear combination of the previous search
0045 %       direction and the new (preconditioned) gradient. Possible values
0046 %       for this parameter are:
0047 %           'S-D', 'steep' for beta = 0 (preconditioned steepest descent)
0048 %           'F-R' for Fletcher-Reeves's rule
0049 %           'P-R' for Polak-Ribiere's modified rule
0050 %           'H-S' for Hestenes-Stiefel's modified rule
0051 %           'H-Z' for Hager-Zhang's modified rule
0052 %       See Hager and Zhang 2006, "A survey of nonlinear conjugate gradient
0053 %       methods" for a description of these rules in the Euclidean case and
0054 %       for an explanation of how to adapt them to the preconditioned case.
0055 %       The adaption to the Riemannian case is straightforward: see in code
0056 %       for details. Modified rules take the max between 0 and the computed
0057 %       beta value, which provides automatic restart, except for H-Z which
0058 %       uses a different modification.
0059 %   orth_value (Inf)
0060 %       Following Powell's restart strategy (Math. prog. 1977), restart CG
0061 %       (that is, make a -preconditioned- gradient step) if two successive
0062 %       -preconditioned- gradients are "too" parallel. See for example
0063 %       Hager and Zhang 2006, "A survey of nonlinear conjugate gradient
0064 %       methods", page 12. An infinite value disables this strategy. See in
0065 %       code formula for the specific criterion used.
0066 %   linesearch (@linesearch_adaptive or @linesearch_hint)
0067 %       Function handle to a line search function. The options structure is
0068 %       passed to the line search too, so you can pass it parameters. See
0069 %       each line search's documentation for info. Another available line
0070 %       search in manopt is @linesearch, in /manopt/linesearch/linesearch.m
0071 %       If the problem structure includes a line search hint, then the
0072 %       default line search used is @linesearch_hint.
0073 %   statsfun (none)
0074 %       Function handle to a function that will be called after each
0075 %       iteration to provide the opportunity to log additional statistics.
0076 %       They will be returned in the info struct. See the generic Manopt
0077 %       documentation about solvers for further information.
0078 %   stopfun (none)
0079 %       Function handle to a function that will be called at each iteration
0080 %       to provide the opportunity to specify additional stopping criteria.
0081 %       See the generic Manopt documentation about solvers for further
0082 %       information.
0083 %   verbosity (3)
0084 %       Integer number used to tune the amount of output the algorithm
0085 %       generates during execution (mostly as text in the command window).
0086 %       The higher, the more output. 0 means silent.
0087 %   storedepth (2)
0088 %       Maximum number of different points x of the manifold for which a
0089 %       store structure will be kept in memory in the storedb. If the
0090 %       caching features of Manopt are not used, this is irrelevant. For
0091 %       the CG algorithm, a store depth of 2 should always be sufficient.
0092 %
0093 %
0094 % In most of the examples bundled with the toolbox (see link below), the
0095 % solver can be replaced by the present one if need be.
0096 %
0097 % See also: steepestdescent trustregions manopt/solvers/linesearch manopt/examples
0098 
0099 % An explicit, general listing of this algorithm, with preconditioning,
0100 % can be found in the following paper:
0101 %     @Article{boumal2015lowrank,
0102 %       Title   = {Low-rank matrix completion via preconditioned optimization on the {G}rassmann manifold},
0103 %       Author  = {Boumal, N. and Absil, P.-A.},
0104 %       Journal = {Linear Algebra and its Applications},
0105 %       Year    = {2015},
0106 %       Pages   = {200--239},
0107 %       Volume  = {475},
0108 %       Doi     = {10.1016/j.laa.2015.02.027},
0109 %     }
0110 
0111 % This file is part of Manopt: www.manopt.org.
0112 % Original author: Bamdev Mishra, Dec. 30, 2012.
0113 % Contributors: Nicolas Boumal
0114 % Change log:
0115 %
0116 %   March 14, 2013, NB:
0117 %       Added preconditioner support : see Section 8 in
0118 %       https://www.math.lsu.edu/~hozhang/papers/cgsurvey.pdf
0119 %
0120 %   Sept. 13, 2013, NB:
0121 %       Now logging beta parameter too.
0122 %
0123 %    Nov. 7, 2013, NB:
0124 %       The search direction is no longer normalized before it is passed
0125 %       to the linesearch. This way, it is up to the designers of the
0126 %       linesearch to decide whether they want to use the norm of the
0127 %       search direction in their algorithm or not. There are reasons
0128 %       against it, but practical evidence that it may help too, so we
0129 %       allow it. The default linesearch_adaptive used does exploit the
0130 %       norm information. The base linesearch does not. You may select it
0131 %       by setting options.linesearch = @linesearch;
0132 %
0133 %    Nov. 29, 2013, NB:
0134 %       Documentation improved: options are now explicitly described.
0135 %       Removed the Daniel rule for beta: it was not appropriate for
0136 %       preconditioned CG and I could not find a proper reference for it.
0137 %
0138 %   April 3, 2015 (NB):
0139 %       Works with the new StoreDB class system.
0140 %
0141 %   Aug. 2, 2018 (NB):
0142 %       Now using storedb.remove() to keep the cache lean.
0143 
0144 M = problem.M;
0145 
0146 % Verify that the problem description is sufficient for the solver.
0147 if ~canGetCost(problem)
0148     warning('manopt:getCost', ...
0149         'No cost provided. The algorithm will likely abort.');
0150 end
0151 if ~canGetGradient(problem) && ~canGetApproxGradient(problem)
0152     warning('manopt:getGradient:approx', ...
0153            ['No gradient provided. Using an FD approximation instead (slow).\n' ...
0154             'It may be necessary to increase options.tolgradnorm.\n' ...
0155             'To disable this warning: warning(''off'', ''manopt:getGradient:approx'')']);
0156     problem.approxgrad = approxgradientFD(problem);
0157 end
0158 
0159 % Set local defaults here
0160 localdefaults.minstepsize = 1e-10;
0161 localdefaults.maxiter = 1000;
0162 localdefaults.tolgradnorm = 1e-6;
0163 localdefaults.storedepth = 20;
0164 % Changed by NB : H-S has the "auto restart" property.
0165 % See Hager-Zhang 2005/2006 survey about CG methods.
0166 % The auto restart comes from the 'max(0, ...)', not so much from the
0167 % reason stated in Hager-Zhang I think. P-R also has auto restart.
0168 localdefaults.beta_type = 'H-S';
0169 localdefaults.orth_value = Inf; % by BM as suggested in Nocedal and Wright
0170 
0171     
0172 % Depending on whether the problem structure specifies a hint for
0173 % line-search algorithms, choose a default line-search that works on
0174 % its own (typical) or that uses the hint.
0175 if ~canGetLinesearch(problem)
0176     localdefaults.linesearch = @linesearch_adaptive;
0177 else
0178     localdefaults.linesearch = @linesearch_hint;
0179 end
0180 
0181 % Merge global and local defaults, then merge w/ user options, if any.
0182 localdefaults = mergeOptions(getGlobalDefaults(), localdefaults);
0183 if ~exist('options', 'var') || isempty(options)
0184     options = struct();
0185 end
0186 options = mergeOptions(localdefaults, options);
0187 
0188 timetic = tic();
0189 
0190 % If no initial point x is given by the user, generate one at random.
0191 if ~exist('x', 'var') || isempty(x)
0192     x = M.rand();
0193 end
0194 
0195 % Create a store database and generate a key for the current x
0196 storedb = StoreDB(options.storedepth);
0197 key = storedb.getNewKey();
0198 
0199 % Compute cost-related quantities for x
0200 [cost, grad] = getCostGrad(problem, x, storedb, key);
0201 gradnorm = M.norm(x, grad);
0202 Pgrad = getPrecon(problem, x, grad, storedb, key);
0203 gradPgrad = M.inner(x, grad, Pgrad);
0204 
0205 % Iteration counter (at any point, iter is the number of fully executed
0206 % iterations so far)
0207 iter = 0;
0208 
0209 % Save stats in a struct array info and preallocate.
0210 stats = savestats();
0211 info(1) = stats;
0212 info(min(10000, options.maxiter+1)).iter = [];
0213 
0214 
0215 if options.verbosity >= 2
0216     fprintf(' iter\t               cost val\t    grad. norm\n');
0217 end
0218 
0219 % Compute a first descent direction (not normalized)
0220 desc_dir = M.lincomb(x, -1, Pgrad);
0221 
0222 
0223 % Start iterating until stopping criterion triggers
0224 while true
0225     
0226     % Display iteration information
0227     if options.verbosity >= 2
0228         fprintf('%5d\t%+.16e\t%.8e\n', iter, cost, gradnorm);
0229     end
0230     
0231     % Start timing this iteration
0232     timetic = tic();
0233     
0234     % Run standard stopping criterion checks
0235     [stop, reason] = stoppingcriterion(problem, x, options, info, iter+1);
0236     
0237     % Run specific stopping criterion check
0238     if ~stop && abs(stats.stepsize) < options.minstepsize
0239         stop = true;
0240         reason = sprintf(['Last stepsize smaller than minimum '  ...
0241                           'allowed; options.minstepsize = %g.'], ...
0242                           options.minstepsize);
0243     end
0244     
0245     if stop
0246         if options.verbosity >= 1
0247             fprintf([reason '\n']);
0248         end
0249         break;
0250     end
0251     
0252     
0253     % The line search algorithms require the directional derivative of the
0254     % cost at the current point x along the search direction.
0255     df0 = M.inner(x, grad, desc_dir);
0256         
0257     % If we didn't get a descent direction: restart, i.e., switch to the
0258     % negative gradient. Equivalent to resetting the CG direction to a
0259     % steepest descent step, which discards the past information.
0260     if df0 >= 0
0261         
0262         % Or we switch to the negative gradient direction.
0263         if options.verbosity >= 3
0264             fprintf(['Conjugate gradient info: got an ascent direction '...
0265                      '(df0 = %2e), reset to the (preconditioned) '...
0266                      'steepest descent direction.\n'], df0);
0267         end
0268         % Reset to negative gradient: this discards the CG memory.
0269         desc_dir = M.lincomb(x, -1, Pgrad);
0270         df0 = -gradPgrad;
0271         
0272     end
0273     
0274     
0275     % Execute line search
0276     [stepsize, newx, newkey, lsstats] = options.linesearch( ...
0277                    problem, x, desc_dir, cost, df0, options, storedb, key);
0278                
0279     
0280     % Compute the new cost-related quantities for newx
0281     [newcost, newgrad] = getCostGrad(problem, newx, storedb, newkey);
0282     newgradnorm = M.norm(newx, newgrad);
0283     Pnewgrad = getPrecon(problem, newx, newgrad, storedb, newkey);
0284     newgradPnewgrad = M.inner(newx, newgrad, Pnewgrad);
0285     
0286     
0287     % Apply the CG scheme to compute the next search direction.
0288     %
0289     % This paper https://www.math.lsu.edu/~hozhang/papers/cgsurvey.pdf
0290     % by Hager and Zhang lists many known beta rules. The rules defined
0291     % here can be found in that paper (or are provided with additional
0292     % references), adapted to the Riemannian setting.
0293     %
0294     if strcmpi(options.beta_type, 'steep') || ...
0295        strcmpi(options.beta_type, 'S-D')              % Gradient Descent
0296         
0297         beta = 0;
0298         desc_dir = M.lincomb(newx, -1, Pnewgrad);
0299         
0300     else
0301         
0302         oldgrad = M.transp(x, newx, grad);
0303         orth_grads = M.inner(newx, oldgrad, Pnewgrad) / newgradPnewgrad;
0304         
0305         % Powell's restart strategy (see page 12 of Hager and Zhang's
0306         % survey on conjugate gradient methods, for example)
0307         if abs(orth_grads) >= options.orth_value
0308             beta = 0;
0309             desc_dir = M.lincomb(x, -1, Pnewgrad);
0310             
0311         else % Compute the CG modification
0312             
0313             desc_dir = M.transp(x, newx, desc_dir);
0314             
0315             switch upper(options.beta_type)
0316             
0317                 case 'F-R'  % Fletcher-Reeves
0318                     beta = newgradPnewgrad / gradPgrad;
0319                 
0320                 case 'P-R'  % Polak-Ribiere+
0321                     % vector grad(new) - transported grad(current)
0322                     diff = M.lincomb(newx, 1, newgrad, -1, oldgrad);
0323                     ip_diff = M.inner(newx, Pnewgrad, diff);
0324                     beta = ip_diff / gradPgrad;
0325                     beta = max(0, beta);
0326                 
0327                 case 'H-S'  % Hestenes-Stiefel+
0328                     diff = M.lincomb(newx, 1, newgrad, -1, oldgrad);
0329                     ip_diff = M.inner(newx, Pnewgrad, diff);
0330                     beta = ip_diff / M.inner(newx, diff, desc_dir);
0331                     beta = max(0, beta);
0332 
0333                 case 'H-Z' % Hager-Zhang+
0334                     diff = M.lincomb(newx, 1, newgrad, -1, oldgrad);
0335                     Poldgrad = M.transp(x, newx, Pgrad);
0336                     Pdiff = M.lincomb(newx, 1, Pnewgrad, -1, Poldgrad);
0337                     deno = M.inner(newx, diff, desc_dir);
0338                     numo = M.inner(newx, diff, Pnewgrad);
0339                     numo = numo - 2*M.inner(newx, diff, Pdiff)*...
0340                                      M.inner(newx, desc_dir, newgrad) / deno;
0341                     beta = numo / deno;
0342 
0343                     % Robustness (see Hager-Zhang paper mentioned above)
0344                     desc_dir_norm = M.norm(newx, desc_dir);
0345                     eta_HZ = -1 / ( desc_dir_norm * min(0.01, gradnorm) );
0346                     beta = max(beta, eta_HZ);
0347 
0348                 otherwise
0349                     error(['Unknown options.beta_type. ' ...
0350                            'Should be steep, S-D, F-R, P-R, H-S or H-Z.']);
0351             end
0352             
0353             desc_dir = M.lincomb(newx, -1, Pnewgrad, beta, desc_dir);
0354         
0355         end
0356         
0357     end
0358     
0359     % Transfer iterate info.
0360     storedb.removefirstifdifferent(key, newkey);
0361     x = newx;
0362     key = newkey;
0363     cost = newcost;
0364     grad = newgrad;
0365     Pgrad = Pnewgrad;
0366     gradnorm = newgradnorm;
0367     gradPgrad = newgradPnewgrad;
0368     
0369     % iter is the number of iterations we have accomplished.
0370     iter = iter + 1;
0371     
0372     % Make sure we don't use too much memory for the store database.
0373     storedb.purge();
0374     
0375     % Log statistics for freshly executed iteration.
0376     stats = savestats();
0377     info(iter+1) = stats;
0378     
0379 end
0380 
0381 
0382 info = info(1:iter+1);
0383 
0384 if options.verbosity >= 1
0385     fprintf('Total time is %f [s] (excludes statsfun)\n', info(end).time);
0386 end
0387 
0388 
0389 % Routine in charge of collecting the current iteration stats
0390 function stats = savestats()
0391     stats.iter = iter;
0392     stats.cost = cost;
0393     stats.gradnorm = gradnorm;
0394     if iter == 0
0395         stats.stepsize = nan;
0396         stats.time = toc(timetic);
0397         stats.linesearch = [];
0398         stats.beta = 0;
0399     else
0400         stats.stepsize = stepsize;
0401         stats.time = info(iter).time + toc(timetic);
0402         stats.linesearch = lsstats;
0403         stats.beta = beta;
0404     end
0405     stats = applyStatsfun(problem, x, storedb, key, options, stats);
0406 end
0407 
0408 end
0409 
0410

Generated on Mon 10-Sep-2018 11:48:06 by m2html © 2005