Home > manopt > solvers > trustregions > trustregions.m

trustregions

PURPOSE ^

Riemannian trust-regions solver for optimization on manifolds.

SYNOPSIS ^

function [x, cost, info, options] = trustregions(problem, x, options)

DESCRIPTION ^

 Riemannian trust-regions solver for optimization on manifolds.

 function [x, cost, info, options] = trustregions(problem)
 function [x, cost, info, options] = trustregions(problem, x0)
 function [x, cost, info, options] = trustregions(problem, x0, options)
 function [x, cost, info, options] = trustregions(problem, [], options)

 This is the Riemannian Trust-Region solver (with tCG inner solve), named
 RTR. This solver will attempt to minimize the cost function described in
 the problem structure. It requires the availability of the cost function
 and of its gradient. It will issue calls for the Hessian. If no Hessian
 nor approximate Hessian is provided, a standard approximation of the
 Hessian based on the gradient will be computed. If a preconditioner for
 the Hessian is provided, it will be used.

 If no gradient is provided, an approximation of the gradient is computed,
 but this can be slow for manifolds of high dimension.

 For a description of the algorithm and theorems offering convergence
 guarantees, see the references below. Documentation for this solver is
 available online at:

 http://www.manopt.org/solver_documentation_trustregions.html


 The initial iterate is x0 if it is provided. Otherwise, a random point on
 the manifold is picked. To specify options whilst not specifying an
 initial iterate, give x0 as [] (the empty matrix).

 The two outputs 'x' and 'cost' are the last reached point on the manifold
 and its cost. Notice that x is not necessarily the best reached point,
 because this solver is not forced to be a descent method. In particular,
 very close to convergence, it is sometimes preferable to accept very
 slight increases in the cost value (on the order of the machine epsilon)
 in the process of reaching fine convergence.
 
 The output 'info' is a struct-array which contains information about the
 iterations:
   iter (integer)
       The (outer) iteration number, or number of steps considered
       (whether accepted or rejected). The initial guess is 0.
    cost (double)
       The corresponding cost value.
    gradnorm (double)
       The (Riemannian) norm of the gradient.
    numinner (integer)
       The number of inner iterations executed to compute this iterate.
       Inner iterations are truncated-CG steps. Each one requires a
       Hessian (or approximate Hessian) evaluation.
    time (double)
       The total elapsed time in seconds to reach the corresponding cost.
    rho (double)
       The performance ratio for the iterate.
    rhonum, rhoden (double)
       Regularized numerator and denominator of the performance ratio:
       rho = rhonum/rhoden. See options.rho_regularization.
    accepted (boolean)
       Whether the proposed iterate was accepted or not.
    stepsize (double)
       The (Riemannian) norm of the vector returned by the inner solver
       tCG and which is retracted to obtain the proposed next iterate. If
       accepted = true for the corresponding iterate, this is the size of
       the step from the previous to the new iterate. If accepted is
       false, the step was not executed and this is the size of the
       rejected step.
    Delta (double)
       The trust-region radius at the outer iteration.
    cauchy (boolean)
       Whether the Cauchy point was used or not (if useRand is true).
   And possibly additional information logged by options.statsfun.
 For example, type [info.gradnorm] to obtain a vector of the successive
 gradient norms reached at each (outer) iteration.

 The options structure is used to overwrite the default values. All
 options have a default value and are hence optional. To force an option
 value, pass an options structure with a field options.optionname, where
 optionname is one of the following and the default value is indicated
 between parentheses:

   tolgradnorm (1e-6)
       The algorithm terminates if the norm of the gradient drops below
       this. For well-scaled problems, a rule of thumb is that you can
       expect to reduce the gradient norm by 8 orders of magnitude
       (sqrt(eps)) compared to the gradient norm at a "typical" point (a
       rough initial iterate for example). Further decrease is sometimes
       possible, but inexact floating point arithmetic will eventually
       limit the final accuracy. If tolgradnorm is set too low, the
       algorithm may end up iterating forever (or at least until another
       stopping criterion triggers).
   maxiter (1000)
       The algorithm terminates if maxiter (outer) iterations were executed.
   maxtime (Inf)
       The algorithm terminates if maxtime seconds elapsed.
    miniter (3)
       Minimum number of outer iterations (used only if useRand is true).
    mininner (1)
       Minimum number of inner iterations (for tCG).
    maxinner (problem.M.dim() : the manifold's dimension)
       Maximum number of inner iterations (for tCG).
    Delta_bar (problem.M.typicaldist() or sqrt(problem.M.dim()))
       Maximum trust-region radius. If you specify this parameter but not
       Delta0, then Delta0 will be set to 1/8 times this parameter.
   Delta0 (Delta_bar/8)
       Initial trust-region radius. If you observe a long plateau at the
       beginning of the convergence plot (gradient norm VS iteration), it
       may pay off to try to tune this parameter to shorten the plateau.
       You should not set this parameter without setting Delta_bar too (at
       a larger value).
    useRand (false)
       Set to true if the trust-region solve is to be initiated with a
       random tangent vector. If set to true, no preconditioner will be
       used. This option is set to true in some scenarios to escape saddle
       points, but is otherwise seldom activated.
    kappa (0.1)
       tCG inner kappa convergence tolerance.
       kappa > 0 is the linear convergence target rate: tCG will terminate
       early if the residual was reduced by a factor of kappa.
    theta (1.0)
       tCG inner theta convergence tolerance.
       1+theta (theta between 0 and 1) is the superlinear convergence
       target rate. tCG will terminate early if the residual was reduced
       by a power of 1+theta.
    rho_prime (0.1)
       Accept/reject threshold : if rho is at least rho_prime, the outer
       iteration is accepted. Otherwise, it is rejected. In case it is
       rejected, the trust-region radius will have been decreased.
       To ensure this, rho_prime >= 0 must be strictly smaller than 1/4.
       If rho_prime is negative, the algorithm is not guaranteed to
       produce monotonically decreasing cost values. It is strongly
       recommended to set rho_prime > 0, to aid convergence.
   rho_regularization (1e3)
       Close to convergence, evaluating the performance ratio rho is
       numerically challenging. Meanwhile, close to convergence, the
       quadratic model should be a good fit and the steps should be
       accepted. Regularization lets rho go to 1 as the model decrease and
       the actual decrease go to zero. Set this option to zero to disable
       regularization (not recommended). See in-code for the specifics.
       When this is not zero, it may happen that the iterates produced are
       not monotonically improving the cost when very close to
       convergence. This is because the corrected cost improvement could
       change sign if it is negative but very small.
   statsfun (none)
       Function handle to a function that will be called after each
       iteration to provide the opportunity to log additional statistics.
       They will be returned in the info struct. See the generic Manopt
       documentation about solvers for further information. statsfun is
       called with the point x that was reached last, after the
       accept/reject decision. See comment below.
   stopfun (none)
       Function handle to a function that will be called at each iteration
       to provide the opportunity to specify additional stopping criteria.
       See the generic Manopt documentation about solvers for further
       information.
   verbosity (2)
       Integer number used to tune the amount of output the algorithm
       generates during execution (mostly as text in the command window).
       The higher, the more output. 0 means silent. 3 and above includes a
       display of the options structure at the beginning of the execution.
   debug (false)
       Set to true to allow the algorithm to perform additional
       computations for debugging purposes. If a debugging test fails, you
       will be informed of it, usually via the command window. Be aware
       that these additional computations appear in the algorithm timings
       too, and may interfere with operations such as counting the number
       of cost evaluations, etc. (the debug calls get storedb too).
   storedepth (2)
       Maximum number of different points x of the manifold for which a
       store structure will be kept in memory in the storedb for caching.
       If memory usage is an issue, you may try to lower this number.
       Profiling or manopt counters may then help to investigate if a
       performance hit was incurred as a result.

 Notice that statsfun is called with the point x that was reached last,
 after the accept/reject decision. Hence: if the step was accepted, we get
 that new x, with a store which only saw the call for the cost and for the
 gradient. If the step was rejected, we get the same x as previously, with
 the store structure containing everything that was computed at that point
 (possibly including previous rejects at that same point). Hence, statsfun
 should not be used in conjunction with the store to count operations for
 example. Instead, you should use manopt counters: see statscounters.


 Please cite the Manopt paper as well as the research paper:
     @Article{genrtr,
       Title    = {Trust-region methods on {Riemannian} manifolds},
       Author   = {Absil, P.-A. and Baker, C. G. and Gallivan, K. A.},
       Journal  = {Foundations of Computational Mathematics},
       Year     = {2007},
       Number   = {3},
       Pages    = {303--330},
       Volume   = {7},
       Doi      = {10.1007/s10208-005-0179-9}
     }

 See also: steepestdescent conjugategradient manopt/examples

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [x, cost, info, options] = trustregions(problem, x, options)
0002 % Riemannian trust-regions solver for optimization on manifolds.
0003 %
0004 % function [x, cost, info, options] = trustregions(problem)
0005 % function [x, cost, info, options] = trustregions(problem, x0)
0006 % function [x, cost, info, options] = trustregions(problem, x0, options)
0007 % function [x, cost, info, options] = trustregions(problem, [], options)
0008 %
0009 % This is the Riemannian Trust-Region solver (with tCG inner solve), named
0010 % RTR. This solver will attempt to minimize the cost function described in
0011 % the problem structure. It requires the availability of the cost function
0012 % and of its gradient. It will issue calls for the Hessian. If no Hessian
0013 % nor approximate Hessian is provided, a standard approximation of the
0014 % Hessian based on the gradient will be computed. If a preconditioner for
0015 % the Hessian is provided, it will be used.
0016 %
0017 % If no gradient is provided, an approximation of the gradient is computed,
0018 % but this can be slow for manifolds of high dimension.
0019 %
0020 % For a description of the algorithm and theorems offering convergence
0021 % guarantees, see the references below. Documentation for this solver is
0022 % available online at:
0023 %
0024 % http://www.manopt.org/solver_documentation_trustregions.html
0025 %
0026 %
0027 % The initial iterate is x0 if it is provided. Otherwise, a random point on
0028 % the manifold is picked. To specify options whilst not specifying an
0029 % initial iterate, give x0 as [] (the empty matrix).
0030 %
0031 % The two outputs 'x' and 'cost' are the last reached point on the manifold
0032 % and its cost. Notice that x is not necessarily the best reached point,
0033 % because this solver is not forced to be a descent method. In particular,
0034 % very close to convergence, it is sometimes preferable to accept very
0035 % slight increases in the cost value (on the order of the machine epsilon)
0036 % in the process of reaching fine convergence.
0037 %
0038 % The output 'info' is a struct-array which contains information about the
0039 % iterations:
0040 %   iter (integer)
0041 %       The (outer) iteration number, or number of steps considered
0042 %       (whether accepted or rejected). The initial guess is 0.
0043 %    cost (double)
0044 %       The corresponding cost value.
0045 %    gradnorm (double)
0046 %       The (Riemannian) norm of the gradient.
0047 %    numinner (integer)
0048 %       The number of inner iterations executed to compute this iterate.
0049 %       Inner iterations are truncated-CG steps. Each one requires a
0050 %       Hessian (or approximate Hessian) evaluation.
0051 %    time (double)
0052 %       The total elapsed time in seconds to reach the corresponding cost.
0053 %    rho (double)
0054 %       The performance ratio for the iterate.
0055 %    rhonum, rhoden (double)
0056 %       Regularized numerator and denominator of the performance ratio:
0057 %       rho = rhonum/rhoden. See options.rho_regularization.
0058 %    accepted (boolean)
0059 %       Whether the proposed iterate was accepted or not.
0060 %    stepsize (double)
0061 %       The (Riemannian) norm of the vector returned by the inner solver
0062 %       tCG and which is retracted to obtain the proposed next iterate. If
0063 %       accepted = true for the corresponding iterate, this is the size of
0064 %       the step from the previous to the new iterate. If accepted is
0065 %       false, the step was not executed and this is the size of the
0066 %       rejected step.
0067 %    Delta (double)
0068 %       The trust-region radius at the outer iteration.
0069 %    cauchy (boolean)
0070 %       Whether the Cauchy point was used or not (if useRand is true).
0071 %   And possibly additional information logged by options.statsfun.
0072 % For example, type [info.gradnorm] to obtain a vector of the successive
0073 % gradient norms reached at each (outer) iteration.
0074 %
0075 % The options structure is used to overwrite the default values. All
0076 % options have a default value and are hence optional. To force an option
0077 % value, pass an options structure with a field options.optionname, where
0078 % optionname is one of the following and the default value is indicated
0079 % between parentheses:
0080 %
0081 %   tolgradnorm (1e-6)
0082 %       The algorithm terminates if the norm of the gradient drops below
0083 %       this. For well-scaled problems, a rule of thumb is that you can
0084 %       expect to reduce the gradient norm by 8 orders of magnitude
0085 %       (sqrt(eps)) compared to the gradient norm at a "typical" point (a
0086 %       rough initial iterate for example). Further decrease is sometimes
0087 %       possible, but inexact floating point arithmetic will eventually
0088 %       limit the final accuracy. If tolgradnorm is set too low, the
0089 %       algorithm may end up iterating forever (or at least until another
0090 %       stopping criterion triggers).
0091 %   maxiter (1000)
0092 %       The algorithm terminates if maxiter (outer) iterations were executed.
0093 %   maxtime (Inf)
0094 %       The algorithm terminates if maxtime seconds elapsed.
0095 %    miniter (3)
0096 %       Minimum number of outer iterations (used only if useRand is true).
0097 %    mininner (1)
0098 %       Minimum number of inner iterations (for tCG).
0099 %    maxinner (problem.M.dim() : the manifold's dimension)
0100 %       Maximum number of inner iterations (for tCG).
0101 %    Delta_bar (problem.M.typicaldist() or sqrt(problem.M.dim()))
0102 %       Maximum trust-region radius. If you specify this parameter but not
0103 %       Delta0, then Delta0 will be set to 1/8 times this parameter.
0104 %   Delta0 (Delta_bar/8)
0105 %       Initial trust-region radius. If you observe a long plateau at the
0106 %       beginning of the convergence plot (gradient norm VS iteration), it
0107 %       may pay off to try to tune this parameter to shorten the plateau.
0108 %       You should not set this parameter without setting Delta_bar too (at
0109 %       a larger value).
0110 %    useRand (false)
0111 %       Set to true if the trust-region solve is to be initiated with a
0112 %       random tangent vector. If set to true, no preconditioner will be
0113 %       used. This option is set to true in some scenarios to escape saddle
0114 %       points, but is otherwise seldom activated.
0115 %    kappa (0.1)
0116 %       tCG inner kappa convergence tolerance.
0117 %       kappa > 0 is the linear convergence target rate: tCG will terminate
0118 %       early if the residual was reduced by a factor of kappa.
0119 %    theta (1.0)
0120 %       tCG inner theta convergence tolerance.
0121 %       1+theta (theta between 0 and 1) is the superlinear convergence
0122 %       target rate. tCG will terminate early if the residual was reduced
0123 %       by a power of 1+theta.
0124 %    rho_prime (0.1)
0125 %       Accept/reject threshold : if rho is at least rho_prime, the outer
0126 %       iteration is accepted. Otherwise, it is rejected. In case it is
0127 %       rejected, the trust-region radius will have been decreased.
0128 %       To ensure this, rho_prime >= 0 must be strictly smaller than 1/4.
0129 %       If rho_prime is negative, the algorithm is not guaranteed to
0130 %       produce monotonically decreasing cost values. It is strongly
0131 %       recommended to set rho_prime > 0, to aid convergence.
0132 %   rho_regularization (1e3)
0133 %       Close to convergence, evaluating the performance ratio rho is
0134 %       numerically challenging. Meanwhile, close to convergence, the
0135 %       quadratic model should be a good fit and the steps should be
0136 %       accepted. Regularization lets rho go to 1 as the model decrease and
0137 %       the actual decrease go to zero. Set this option to zero to disable
0138 %       regularization (not recommended). See in-code for the specifics.
0139 %       When this is not zero, it may happen that the iterates produced are
0140 %       not monotonically improving the cost when very close to
0141 %       convergence. This is because the corrected cost improvement could
0142 %       change sign if it is negative but very small.
0143 %   statsfun (none)
0144 %       Function handle to a function that will be called after each
0145 %       iteration to provide the opportunity to log additional statistics.
0146 %       They will be returned in the info struct. See the generic Manopt
0147 %       documentation about solvers for further information. statsfun is
0148 %       called with the point x that was reached last, after the
0149 %       accept/reject decision. See comment below.
0150 %   stopfun (none)
0151 %       Function handle to a function that will be called at each iteration
0152 %       to provide the opportunity to specify additional stopping criteria.
0153 %       See the generic Manopt documentation about solvers for further
0154 %       information.
0155 %   verbosity (2)
0156 %       Integer number used to tune the amount of output the algorithm
0157 %       generates during execution (mostly as text in the command window).
0158 %       The higher, the more output. 0 means silent. 3 and above includes a
0159 %       display of the options structure at the beginning of the execution.
0160 %   debug (false)
0161 %       Set to true to allow the algorithm to perform additional
0162 %       computations for debugging purposes. If a debugging test fails, you
0163 %       will be informed of it, usually via the command window. Be aware
0164 %       that these additional computations appear in the algorithm timings
0165 %       too, and may interfere with operations such as counting the number
0166 %       of cost evaluations, etc. (the debug calls get storedb too).
0167 %   storedepth (2)
0168 %       Maximum number of different points x of the manifold for which a
0169 %       store structure will be kept in memory in the storedb for caching.
0170 %       If memory usage is an issue, you may try to lower this number.
0171 %       Profiling or manopt counters may then help to investigate if a
0172 %       performance hit was incurred as a result.
0173 %
0174 % Notice that statsfun is called with the point x that was reached last,
0175 % after the accept/reject decision. Hence: if the step was accepted, we get
0176 % that new x, with a store which only saw the call for the cost and for the
0177 % gradient. If the step was rejected, we get the same x as previously, with
0178 % the store structure containing everything that was computed at that point
0179 % (possibly including previous rejects at that same point). Hence, statsfun
0180 % should not be used in conjunction with the store to count operations for
0181 % example. Instead, you should use manopt counters: see statscounters.
0182 %
0183 %
0184 % Please cite the Manopt paper as well as the research paper:
0185 %     @Article{genrtr,
0186 %       Title    = {Trust-region methods on {Riemannian} manifolds},
0187 %       Author   = {Absil, P.-A. and Baker, C. G. and Gallivan, K. A.},
0188 %       Journal  = {Foundations of Computational Mathematics},
0189 %       Year     = {2007},
0190 %       Number   = {3},
0191 %       Pages    = {303--330},
0192 %       Volume   = {7},
0193 %       Doi      = {10.1007/s10208-005-0179-9}
0194 %     }
0195 %
0196 % See also: steepestdescent conjugategradient manopt/examples
0197 
0198 % An explicit, general listing of this algorithm, with preconditioning,
0199 % can be found in the following paper:
0200 %     @Article{boumal2015lowrank,
0201 %       Title   = {Low-rank matrix completion via preconditioned optimization on the {G}rassmann manifold},
0202 %       Author  = {Boumal, N. and Absil, P.-A.},
0203 %       Journal = {Linear Algebra and its Applications},
0204 %       Year    = {2015},
0205 %       Pages   = {200--239},
0206 %       Volume  = {475},
0207 %       Doi     = {10.1016/j.laa.2015.02.027},
0208 %     }
0209 
0210 % When the Hessian is not specified, it is approximated with
0211 % finite-differences of the gradient. The resulting method is called
0212 % RTR-FD. Some convergence theory for it is available in this paper:
0213 % @incollection{boumal2015rtrfd
0214 %     author={Boumal, N.},
0215 %     title={Riemannian trust regions with finite-difference Hessian approximations are globally convergent},
0216 %     year={2015},
0217 %     booktitle={Geometric Science of Information}
0218 % }
0219 
0220 
0221 % This file is part of Manopt: www.manopt.org.
0222 % This code is an adaptation to Manopt of the original GenRTR code:
0223 % RTR - Riemannian Trust-Region
0224 % (c) 2004-2007, P.-A. Absil, C. G. Baker, K. A. Gallivan
0225 % Florida State University
0226 % School of Computational Science
0227 % (http://www.math.fsu.edu/~cbaker/GenRTR/?page=download)
0228 % See accompanying license file.
0229 % The adaptation was executed by Nicolas Boumal.
0230 %
0231 %
0232 % Change log:
0233 %
0234 %   NB April 3, 2013:
0235 %       tCG now returns the Hessian along the returned direction eta, so
0236 %       that we do not compute that Hessian redundantly: some savings at
0237 %       each iteration. Similarly, if the useRand flag is on, we spare an
0238 %       extra Hessian computation at each outer iteration too, owing to
0239 %       some modifications in the Cauchy point section of the code specific
0240 %       to useRand = true.
0241 %
0242 %   NB Aug. 22, 2013:
0243 %       This function is now Octave compatible. The transition called for
0244 %       two changes which would otherwise not be advisable. (1) tic/toc is
0245 %       now used as is, as opposed to the safer way:
0246 %       t = tic(); elapsed = toc(t);
0247 %       And (2), the (formerly inner) function savestats was moved outside
0248 %       the main function to not be nested anymore. This is arguably less
0249 %       elegant, but Octave does not (and likely will not) support nested
0250 %       functions.
0251 %
0252 %   NB Dec. 2, 2013:
0253 %       The in-code documentation was largely revised and expanded.
0254 %
0255 %   NB Dec. 2, 2013:
0256 %       The former heuristic which triggered when rhonum was very small and
0257 %       forced rho = 1 has been replaced by a smoother heuristic which
0258 %       consists in regularizing rhonum and rhoden before computing their
0259 %       ratio. It is tunable via options.rho_regularization. Furthermore,
0260 %       the solver now detects if tCG did not obtain a model decrease
0261 %       (which is theoretically impossible but may happen because of
0262 %       numerical errors and/or because of a nonlinear/nonsymmetric Hessian
0263 %       operator, which is the case for finite difference approximations).
0264 %       When such an anomaly is detected, the step is rejected and the
0265 %       trust region radius is decreased.
0266 %       Feb. 18, 2015 note: this is less useful now, as tCG now guarantees
0267 %       model decrease even for the finite difference approximation of the
0268 %       Hessian. It is still useful in case of numerical errors, but this
0269 %       is less stringent.
0270 %
0271 %   NB Dec. 3, 2013:
0272 %       The stepsize is now registered at each iteration, at a small
0273 %       additional cost. The defaults for Delta_bar and Delta0 are better
0274 %       defined. Setting Delta_bar in the options will automatically set
0275 %       Delta0 accordingly. In Manopt 1.0.4, the defaults for these options
0276 %       were not treated appropriately because of an incorrect use of the
0277 %       isfield() built-in function.
0278 %
0279 %   NB Feb. 18, 2015:
0280 %       Added some comments. Also, Octave now supports safe tic/toc usage,
0281 %       so we reverted the changes to use that again (see Aug. 22, 2013 log
0282 %       entry).
0283 %
0284 %   NB April 3, 2015:
0285 %       Works with the new StoreDB class system.
0286 %
0287 %   NB April 8, 2015:
0288 %       No Hessian warning if approximate Hessian explicitly available.
0289 %
0290 %   NB Nov. 1, 2016:
0291 %       Now uses approximate gradient via finite differences if need be.
0292 %
0293 %   NB Aug. 2, 2018:
0294 %       Using storedb.remove() to keep the cache lean, which allowed to
0295 %       reduce storedepth to 2 from 20 (by default).
0296 
0297 M = problem.M;
0298 
0299 % Verify that the problem description is sufficient for the solver.
0300 if ~canGetCost(problem)
0301     warning('manopt:getCost', ...
0302             'No cost provided. The algorithm will likely abort.');  
0303 end
0304 if ~canGetGradient(problem) && ~canGetApproxGradient(problem)
0305     % Note: we do not give a warning if an approximate gradient is
0306     % explicitly given in the problem description, as in that case the user
0307     % seems to be aware of the issue.
0308     warning('manopt:getGradient:approx', ...
0309            ['No gradient provided. Using an FD approximation instead (slow).\n' ...
0310             'It may be necessary to increase options.tolgradnorm.\n' ...
0311             'To disable this warning: warning(''off'', ''manopt:getGradient:approx'')']);
0312     problem.approxgrad = approxgradientFD(problem);
0313 end
0314 if ~canGetHessian(problem) && ~canGetApproxHessian(problem)
0315     % Note: we do not give a warning if an approximate Hessian is
0316     % explicitly given in the problem description, as in that case the user
0317     % seems to be aware of the issue.
0318     warning('manopt:getHessian:approx', ...
0319            ['No Hessian provided. Using an FD approximation instead.\n' ...
0320             'To disable this warning: warning(''off'', ''manopt:getHessian:approx'')']);
0321     problem.approxhess = approxhessianFD(problem);
0322 end
0323 
0324 % Define some strings for display
0325 tcg_stop_reason = {'negative curvature',...
0326                    'exceeded trust region',...
0327                    'reached target residual-kappa (linear)',...
0328                    'reached target residual-theta (superlinear)',...
0329                    'maximum inner iterations',...
0330                    'model increased'};
0331 
0332 % Set local defaults here
0333 localdefaults.verbosity = 2;
0334 localdefaults.maxtime = inf;
0335 localdefaults.miniter = 3;
0336 localdefaults.maxiter = 1000;
0337 localdefaults.mininner = 1;
0338 localdefaults.maxinner = M.dim();
0339 localdefaults.tolgradnorm = 1e-6;
0340 localdefaults.kappa = 0.1;
0341 localdefaults.theta = 1.0;
0342 localdefaults.rho_prime = 0.1;
0343 localdefaults.useRand = false;
0344 localdefaults.rho_regularization = 1e3;
0345 
0346 % Merge global and local defaults, then merge w/ user options, if any.
0347 localdefaults = mergeOptions(getGlobalDefaults(), localdefaults);
0348 if ~exist('options', 'var') || isempty(options)
0349     options = struct();
0350 end
0351 options = mergeOptions(localdefaults, options);
0352 
0353 % Set default Delta_bar and Delta0 separately to deal with additional
0354 % logic: if Delta_bar is provided but not Delta0, let Delta0 automatically
0355 % be some fraction of the provided Delta_bar.
0356 if ~isfield(options, 'Delta_bar')
0357     if isfield(M, 'typicaldist')
0358         options.Delta_bar = M.typicaldist();
0359     else
0360         options.Delta_bar = sqrt(M.dim());
0361     end 
0362 end
0363 if ~isfield(options,'Delta0')
0364     options.Delta0 = options.Delta_bar / 8;
0365 end
0366 
0367 % Check some option values
0368 assert(options.rho_prime < 1/4, ...
0369         'options.rho_prime must be strictly smaller than 1/4.');
0370 assert(options.Delta_bar > 0, ...
0371         'options.Delta_bar must be positive.');
0372 assert(options.Delta0 > 0 && options.Delta0 < options.Delta_bar, ...
0373         'options.Delta0 must be positive and smaller than Delta_bar.');
0374 
0375 % It is sometimes useful to check what the actual option values are.
0376 if options.verbosity >= 3
0377     disp(options);
0378 end
0379 
0380 ticstart = tic();
0381 
0382 % If no initial point x is given by the user, generate one at random.
0383 if ~exist('x', 'var') || isempty(x)
0384     x = M.rand();
0385 end
0386 
0387 % Create a store database and get a key for the current x
0388 storedb = StoreDB(options.storedepth);
0389 key = storedb.getNewKey();
0390 
0391 %% Initializations
0392 
0393 % k counts the outer (TR) iterations. The semantic is that k counts the
0394 % number of iterations fully executed so far.
0395 k = 0;
0396 
0397 % Initialize solution and companion measures: f(x), fgrad(x)
0398 [fx, fgradx] = getCostGrad(problem, x, storedb, key);
0399 norm_grad = M.norm(x, fgradx);
0400 
0401 % Initialize trust-region radius
0402 Delta = options.Delta0;
0403 
0404 % Save stats in a struct array info, and preallocate.
0405 if ~exist('used_cauchy', 'var')
0406     used_cauchy = [];
0407 end
0408 stats = savestats(problem, x, storedb, key, options, k, fx, norm_grad, Delta, ticstart);
0409 info(1) = stats;
0410 info(min(10000, options.maxiter+1)).iter = [];
0411 
0412 % ** Display:
0413 if options.verbosity == 2
0414    fprintf(['%3s %3s      %5s                %5s     ',...
0415             'f: %+e   |grad|: %e\n'],...
0416            '   ','   ','     ','     ', fx, norm_grad);
0417 elseif options.verbosity > 2
0418    fprintf('************************************************************************\n');
0419    fprintf('%3s %3s    k: %5s     num_inner: %5s     %s\n',...
0420            '','','______','______','');
0421    fprintf('       f(x) : %+e       |grad| : %e\n', fx, norm_grad);
0422    fprintf('      Delta : %f\n', Delta);
0423 end
0424 
0425 % To keep track of consecutive radius changes, so that we can warn the
0426 % user if it appears necessary.
0427 consecutive_TRplus = 0;
0428 consecutive_TRminus = 0;
0429 
0430 
0431 % **********************
0432 % ** Start of TR loop **
0433 % **********************
0434 while true
0435     
0436     % Start clock for this outer iteration
0437     ticstart = tic();
0438 
0439     % Run standard stopping criterion checks
0440     [stop, reason] = stoppingcriterion(problem, x, options, info, k+1);
0441     
0442     % If the stopping criterion that triggered is the tolerance on the
0443     % gradient norm but we are using randomization, make sure we make at
0444     % least miniter iterations to give randomization a chance at escaping
0445     % saddle points.
0446     if stop == 2 && options.useRand && k < options.miniter
0447         stop = 0;
0448     end
0449     
0450     if stop
0451         if options.verbosity >= 1
0452             fprintf([reason '\n']);
0453         end
0454         break;
0455     end
0456 
0457     if options.verbosity > 2 || options.debug > 0
0458         fprintf('************************************************************************\n');
0459     end
0460 
0461     % *************************
0462     % ** Begin TR Subproblem **
0463     % *************************
0464   
0465     % Determine eta0
0466     if ~options.useRand
0467         % Pick the zero vector
0468         eta = M.zerovec(x);
0469     else
0470         % Random vector in T_x M (this has to be very small)
0471         eta = M.lincomb(x, 1e-6, M.randvec(x));
0472         % Must be inside trust-region
0473         while M.norm(x, eta) > Delta
0474             eta = M.lincomb(x, sqrt(sqrt(eps)), eta);
0475         end
0476     end
0477 
0478     % Solve TR subproblem approximately
0479     [eta, Heta, numit, stop_inner] = ...
0480                 tCG(problem, x, fgradx, eta, Delta, options, storedb, key);
0481     srstr = tcg_stop_reason{stop_inner};
0482 
0483     % If using randomized approach, compare result with the Cauchy point.
0484     % Convergence proofs assume that we achieve at least (a fraction of)
0485     % the reduction of the Cauchy point. After this if-block, either all
0486     % eta-related quantities have been changed consistently, or none of
0487     % them have changed.
0488     if options.useRand
0489         used_cauchy = false;
0490         % Check the curvature,
0491         Hg = getHessian(problem, x, fgradx, storedb, key);
0492         g_Hg = M.inner(x, fgradx, Hg);
0493         if g_Hg <= 0
0494             tau_c = 1;
0495         else
0496             tau_c = min( norm_grad^3/(Delta*g_Hg) , 1);
0497         end
0498         % and generate the Cauchy point.
0499         eta_c  = M.lincomb(x, -tau_c * Delta / norm_grad, fgradx);
0500         Heta_c = M.lincomb(x, -tau_c * Delta / norm_grad, Hg);
0501 
0502         % Now that we have computed the Cauchy point in addition to the
0503         % returned eta, we might as well keep the best of them.
0504         mdle  = fx + M.inner(x, fgradx, eta) + .5*M.inner(x, Heta,   eta);
0505         mdlec = fx + M.inner(x, fgradx, eta_c) + .5*M.inner(x, Heta_c, eta_c);
0506         if mdlec < mdle
0507             eta = eta_c;
0508             Heta = Heta_c; % added April 11, 2012
0509             used_cauchy = true;
0510         end
0511     end
0512     
0513     
0514     % This is only computed for logging purposes, because it may be useful
0515     % for some user-defined stopping criteria. If this is not cheap for
0516     % specific applications (compared to evaluating the cost), we should
0517     % reconsider this.
0518     norm_eta = M.norm(x, eta);
0519     
0520     if options.debug > 0
0521         testangle = M.inner(x, eta, fgradx) / (norm_eta*norm_grad);
0522     end
0523     
0524 
0525     % Compute the tentative next iterate (the proposal)
0526     x_prop  = M.retr(x, eta);
0527     key_prop = storedb.getNewKey();
0528 
0529     % Compute the function value of the proposal
0530     fx_prop = getCost(problem, x_prop, storedb, key_prop);
0531 
0532     % Will we accept the proposal or not?
0533     % Check the performance of the quadratic model against the actual cost.
0534     rhonum = fx - fx_prop;
0535     vecrho = M.lincomb(x, 1, fgradx, .5, Heta);
0536     rhoden = -M.inner(x, eta, vecrho);
0537     % rhonum could be anything.
0538     % rhoden should be nonnegative, as guaranteed by tCG, baring numerical
0539     % errors.
0540     
0541     % Heuristic -- added Dec. 2, 2013 (NB) to replace the former heuristic.
0542     % This heuristic is documented in the book by Conn Gould and Toint on
0543     % trust-region methods, section 17.4.2.
0544     % rhonum measures the difference between two numbers. Close to
0545     % convergence, these two numbers are very close to each other, so
0546     % that computing their difference is numerically challenging: there may
0547     % be a significant loss in accuracy. Since the acceptance or rejection
0548     % of the step is conditioned on the ratio between rhonum and rhoden,
0549     % large errors in rhonum result in a very large error in rho, hence in
0550     % erratic acceptance / rejection. Meanwhile, close to convergence,
0551     % steps are usually trustworthy and we should transition to a Newton-
0552     % like method, with rho=1 consistently. The heuristic thus shifts both
0553     % rhonum and rhoden by a small amount such that far from convergence,
0554     % the shift is irrelevant and close to convergence, the ratio rho goes
0555     % to 1, effectively promoting acceptance of the step.
0556     % The rationale is that close to convergence, both rhonum and rhoden
0557     % are quadratic in the distance between x and x_prop. Thus, when this
0558     % distance is on the order of sqrt(eps), the value of rhonum and rhoden
0559     % is on the order of eps, which is indistinguishable from the numerical
0560     % error, resulting in badly estimated rho's.
0561     % For abs(fx) < 1, this heuristic is invariant under offsets of f but
0562     % not under scaling of f. For abs(fx) > 1, the opposite holds. This
0563     % should not alarm us, as this heuristic only triggers at the very last
0564     % iterations if very fine convergence is demanded.
0565     rho_reg = max(1, abs(fx)) * eps * options.rho_regularization;
0566     rhonum = rhonum + rho_reg;
0567     rhoden = rhoden + rho_reg;
0568    
0569     if options.debug > 0
0570         fprintf('DBG:     rhonum : %e\n', rhonum);
0571         fprintf('DBG:     rhoden : %e\n', rhoden);
0572     end
0573     
0574     % This is always true if a linear, symmetric operator is used for the
0575     % Hessian (approximation) and if we had infinite numerical precision.
0576     % In practice, nonlinear approximations of the Hessian such as the
0577     % built-in finite difference approximation and finite numerical
0578     % accuracy can cause the model to increase. In such scenarios, we
0579     % decide to force a rejection of the step and a reduction of the
0580     % trust-region radius. We test the sign of the regularized rhoden since
0581     % the regularization is supposed to capture the accuracy to which
0582     % rhoden is computed: if rhoden were negative before regularization but
0583     % not after, that should not be (and is not) detected as a failure.
0584     %
0585     % Note (Feb. 17, 2015, NB): the most recent version of tCG already
0586     % includes a mechanism to ensure model decrease if the Cauchy step
0587     % attained a decrease (which is theoretically the case under very lax
0588     % assumptions). This being said, it is always possible that numerical
0589     % errors will prevent this, so that it is good to keep a safeguard.
0590     %
0591     % The current strategy is that, if this should happen, then we reject
0592     % the step and reduce the trust region radius. This also ensures that
0593     % the actual cost values are monotonically decreasing.
0594     %
0595     % [This bit of code seems not to trigger because tCG already ensures
0596     %  the model decreases even in the presence of non-linearities; but as
0597     %  a result the radius is not necessarily decreased. Perhaps we should
0598     %  change this with the proposed commented line below; needs testing.]
0599     %
0600     model_decreased = (rhoden >= 0);
0601     % model_decreased = (rhoden >= 0) && (stop_inner ~= 6);
0602     
0603     if ~model_decreased
0604         srstr = [srstr ', model did not decrease']; %#ok<AGROW>
0605     end
0606     
0607     rho = rhonum / rhoden;
0608     
0609     % Added June 30, 2015 following observation by BM.
0610     % With this modification, it is guaranteed that a step rejection is
0611     % always accompanied by a TR reduction. This prevents stagnation in
0612     % this "corner case" (NaN's really aren't supposed to occur, but it's
0613     % nice if we can handle them nonetheless).
0614     if isnan(rho)
0615         fprintf('rho is NaN! Forcing a radius decrease. This should not happen.\n');
0616         if isnan(fx_prop)
0617             fprintf('The cost function returned NaN (perhaps the retraction returned a bad point?)\n');
0618         else
0619             fprintf('The cost function did not return a NaN value.\n');
0620         end
0621     end
0622    
0623     if options.debug > 0
0624         m = @(x, eta) ...
0625           getCost(problem, x, storedb, key) + ...
0626           getDirectionalDerivative(problem, x, eta, storedb, key) + ...
0627              .5*M.inner(x, getHessian(problem, x, eta, storedb, key), eta);
0628         zerovec = M.zerovec(x);
0629         actrho = (fx - fx_prop) / (m(x, zerovec) - m(x, eta));
0630         fprintf('DBG:   new f(x) : %+e\n', fx_prop);
0631         fprintf('DBG: actual rho : %e\n', actrho);
0632         fprintf('DBG:   used rho : %e\n', rho);
0633     end
0634 
0635     % Choose the new TR radius based on the model performance
0636     trstr = '   ';
0637     % If the actual decrease is smaller than 1/4 of the predicted decrease,
0638     % then reduce the TR radius.
0639     if rho < 1/4 || ~model_decreased || isnan(rho)
0640         trstr = 'TR-';
0641         Delta = Delta/4;
0642         consecutive_TRplus = 0;
0643         consecutive_TRminus = consecutive_TRminus + 1;
0644         if consecutive_TRminus >= 5 && options.verbosity >= 2
0645             consecutive_TRminus = -inf;
0646             fprintf(' +++ Detected many consecutive TR- (radius decreases).\n');
0647             fprintf(' +++ Consider decreasing options.Delta_bar by an order of magnitude.\n');
0648             fprintf(' +++ Current values: options.Delta_bar = %g and options.Delta0 = %g.\n', options.Delta_bar, options.Delta0);
0649         end
0650     % If the actual decrease is at least 3/4 of the precicted decrease and
0651     % the tCG (inner solve) hit the TR boundary, increase the TR radius.
0652     % We also keep track of the number of consecutive trust-region radius
0653     % increases. If there are many, this may indicate the need to adapt the
0654     % initial and maximum radii.
0655     elseif rho > 3/4 && (stop_inner == 1 || stop_inner == 2)
0656         trstr = 'TR+';
0657         Delta = min(2*Delta, options.Delta_bar);
0658         consecutive_TRminus = 0;
0659         consecutive_TRplus = consecutive_TRplus + 1;
0660         if consecutive_TRplus >= 5 && options.verbosity >= 1
0661             consecutive_TRplus = -inf;
0662             fprintf(' +++ Detected many consecutive TR+ (radius increases).\n');
0663             fprintf(' +++ Consider increasing options.Delta_bar by an order of magnitude.\n');
0664             fprintf(' +++ Current values: options.Delta_bar = %g and options.Delta0 = %g.\n', options.Delta_bar, options.Delta0);
0665         end
0666     else
0667         % Otherwise, keep the TR radius constant.
0668         consecutive_TRplus = 0;
0669         consecutive_TRminus = 0;
0670     end
0671 
0672     % Choose to accept or reject the proposed step based on the model
0673     % performance. Note the strict inequality.
0674     if model_decreased && rho > options.rho_prime
0675         
0676         % April 17, 2018: a side effect of rho_regularization > 0 is that
0677         % it can happen that the cost function appears to go up (although
0678         % only by a small amount) for some accepted steps. We decide to
0679         % accept this because, numerically, computing the difference
0680         % between fx_prop and fx is more difficult than computing the
0681         % improvement in the model, because fx_prop and fx are on the same
0682         % order of magnitude yet are separated by a very small gap near
0683         % convergence, whereas the model improvement is computed as a sum
0684         % of two small terms. As a result, the step which seems bad may
0685         % turn out to be good, in that it may help reduce the gradient norm
0686         % for example. This update merely informs the user of this event.
0687         % In further updates, we could also introduce this as a stopping
0688         % criterion. It is then important to choose wisely which of x or
0689         % x_prop should be returned (perhaps the one with smallest
0690         % gradient?)
0691         if fx_prop > fx && options.verbosity >= 2
0692             fprintf(['Between line above and below, cost function ' ...
0693                      'increased by %s (step size: %g)\n'], ...
0694                      fx_prop - fx, norm_eta);
0695         end
0696         
0697         accept = true;
0698         accstr = 'acc';
0699         % We accept the step: no need to keep the old cache.
0700         storedb.removefirstifdifferent(key, key_prop);
0701         x = x_prop;
0702         key = key_prop;
0703         fx = fx_prop;
0704         fgradx = getGradient(problem, x, storedb, key);
0705         norm_grad = M.norm(x, fgradx);
0706     else
0707         % We reject the step: no need to keep cache related to the
0708         % tentative step.
0709         storedb.removefirstifdifferent(key_prop, key);
0710         accept = false;
0711         accstr = 'REJ';
0712     end
0713     
0714     % k is the number of iterations we have accomplished.
0715     k = k + 1;
0716     
0717     % Make sure we don't use too much memory for the store database
0718     storedb.purge();
0719     
0720 
0721     % Log statistics for freshly executed iteration.
0722     % Everything after this in the loop is not accounted for in the timing.
0723     stats = savestats(problem, x, storedb, key, options, k, fx, ...
0724                       norm_grad, Delta, ticstart, info, rho, rhonum, ...
0725                       rhoden, accept, numit, norm_eta, used_cauchy);
0726     info(k+1) = stats;
0727 
0728     
0729     % ** Display:
0730     if options.verbosity == 2
0731         fprintf(['%3s %3s   k: %5d     num_inner: %5d     ', ...
0732         'f: %+e   |grad|: %e   %s\n'], ...
0733         accstr,trstr,k,numit,fx,norm_grad,srstr);
0734     elseif options.verbosity > 2
0735         if options.useRand && used_cauchy
0736             fprintf('USED CAUCHY POINT\n');
0737         end
0738         fprintf('%3s %3s    k: %5d     num_inner: %5d     %s\n', ...
0739                 accstr, trstr, k, numit, srstr);
0740         fprintf('       f(x) : %+e     |grad| : %e\n',fx,norm_grad);
0741         if options.debug > 0
0742             fprintf('      Delta : %f          |eta| : %e\n',Delta,norm_eta);
0743         end
0744         fprintf('        rho : %e\n',rho);
0745     end
0746     if options.debug > 0
0747         fprintf('DBG: cos ang(eta,gradf): %d\n',testangle);
0748         if rho == 0
0749             fprintf('DBG: rho = 0, this will likely hinder further convergence.\n');
0750         end
0751     end
0752 
0753 end  % of TR loop (counter: k)
0754 
0755 % Restrict info struct-array to useful part
0756 info = info(1:k+1);
0757 
0758 
0759 if (options.verbosity > 2) || (options.debug > 0)
0760    fprintf('************************************************************************\n');
0761 end
0762 if (options.verbosity > 0) || (options.debug > 0)
0763     fprintf('Total time is %f [s] (excludes statsfun)\n', info(end).time);
0764 end
0765 
0766 % Return the best cost reached
0767 cost = fx;
0768 
0769 end
0770 
0771 
0772 
0773     
0774 
0775 % Routine in charge of collecting the current iteration stats
0776 function stats = savestats(problem, x, storedb, key, options, k, fx, ...
0777                            norm_grad, Delta, ticstart, info, rho, rhonum, ...
0778                            rhoden, accept, numit, norm_eta, used_cauchy)
0779     stats.iter = k;
0780     stats.cost = fx;
0781     stats.gradnorm = norm_grad;
0782     stats.Delta = Delta;
0783     if k == 0
0784         stats.time = toc(ticstart);
0785         stats.rho = inf;
0786         stats.rhonum = NaN;
0787         stats.rhoden = NaN;
0788         stats.accepted = true;
0789         stats.numinner = 0;
0790         stats.stepsize = NaN;
0791         if options.useRand
0792             stats.cauchy = false;
0793         end
0794     else
0795         stats.time = info(k).time + toc(ticstart);
0796         stats.rho = rho;
0797         stats.rhonum = rhonum;
0798         stats.rhoden = rhoden;
0799         stats.accepted = accept;
0800         stats.numinner = numit;
0801         stats.stepsize = norm_eta;
0802         if options.useRand
0803           stats.cauchy = used_cauchy;
0804         end
0805     end
0806     
0807     % See comment about statsfun above: the x and store passed to statsfun
0808     % are that of the most recently accepted point after the iteration
0809     % fully executed.
0810     stats = applyStatsfun(problem, x, storedb, key, options, stats);
0811     
0812 end

Generated on Mon 10-Sep-2018 11:48:06 by m2html © 2005