Description of truncated

 Returns an SVD decomposition of A truncated to rank p.

 function [U, S, V, info] = truncated_svd(A, p)

 Input: A real matrix A of size mxn and an integer p <= min(m, n).
 Output: An orthonormal matrix U of size mxp, an orthonormal matrix Y of
         size nxp and a diagonal matrix S of size pxp with nonnegative and
         decreasing diagonal entries such that USV.' is the best rank p
         approximation of A according to the Frobenius norm. All real.
         This function produces an output akin to svds.
 
 The decomposition is obtained by maximizing
   f(U, V) = .5*norm(U'*A*V, 'fro')^2
 where U, V are orthonormal. Notice that f(U*Q, V*R) = f(U, V) for all
 Q, R orthogonal pxp matrices. Hence, only the column spaces of U and V
 matter and we may perform the optimization over a product of two
 Grassmannian manifolds.

 It is easy to show that maximizing f is equivalent to minimizing g with
   g(U, V) = min_S norm(U*S*V' - A, 'fro')^2,
 which confirms that we are going for a best low-rank approximation of A.
 
 The inner workings of the Grassmann manifold use the built-in svd
 function of Matlab but only for matrices of size mxp and nxp to
 re-orthonormalize them.
 
 Notice that we are actually chasing a best fixed-rank approximation of a
 matrix, which is best obtained by working directly over a manifold of
 fixed-rank matrices. This is simply an example script to demonstrate some
 functionalities of the toolbox.
 
 The code can be modified to accept a function handle for A(x) = A*x
 instead of a matrix A, which is often useful. This would further require
 a function handle At for the transpose of A, such that At(x) = A.'*x.

0001 function [U, S, V, info] = truncated_svd(A, p)
0002 % Returns an SVD decomposition of A truncated to rank p.
0003 %
0004 % function [U, S, V, info] = truncated_svd(A, p)
0005 %
0006 % Input: A real matrix A of size mxn and an integer p <= min(m, n).
0007 % Output: An orthonormal matrix U of size mxp, an orthonormal matrix Y of
0008 %         size nxp and a diagonal matrix S of size pxp with nonnegative and
0009 %         decreasing diagonal entries such that USV.' is the best rank p
0010 %         approximation of A according to the Frobenius norm. All real.
0011 %         This function produces an output akin to svds.
0012 %
0013 % The decomposition is obtained by maximizing
0014 %   f(U, V) = .5*norm(U'*A*V, 'fro')^2
0015 % where U, V are orthonormal. Notice that f(U*Q, V*R) = f(U, V) for all
0016 % Q, R orthogonal pxp matrices. Hence, only the column spaces of U and V
0017 % matter and we may perform the optimization over a product of two
0018 % Grassmannian manifolds.
0019 %
0020 % It is easy to show that maximizing f is equivalent to minimizing g with
0021 %   g(U, V) = min_S norm(U*S*V' - A, 'fro')^2,
0022 % which confirms that we are going for a best low-rank approximation of A.
0023 %
0024 % The inner workings of the Grassmann manifold use the built-in svd
0025 % function of Matlab but only for matrices of size mxp and nxp to
0026 % re-orthonormalize them.
0027 %
0028 % Notice that we are actually chasing a best fixed-rank approximation of a
0029 % matrix, which is best obtained by working directly over a manifold of
0030 % fixed-rank matrices. This is simply an example script to demonstrate some
0031 % functionalities of the toolbox.
0032 %
0033 % The code can be modified to accept a function handle for A(x) = A*x
0034 % instead of a matrix A, which is often useful. This would further require
0035 % a function handle At for the transpose of A, such that At(x) = A.'*x.
0036 
0037 % This file is part of Manopt and is copyrighted. See the license file.
0038 %
0039 % Main author: Nicolas Boumal, July 5, 2013
0040 % Contributors:
0041 %
0042 % Change log:
0043 %
0044 %   Aug. 23, 2021 (XJ):
0045 %       Added AD to compute the egrad and the ehess
0046     
0047     % Generate some random data to test the function if none is given.
0048     if ~exist('A', 'var') || isempty(A)
0049         A = randn(42, 60);
0050     end
0051     if ~exist('p', 'var') || isempty(p)
0052         p = 5;
0053     end
0054     
0055     % Retrieve the size of the problem and make sure the requested
0056     % approximation rank is at most the maximum possible rank.
0057     [m, n] = size(A);
0058     assert(p <= min(m, n), 'p must be smaller than the smallest dimension of A.');
0059     
0060     % Define the cost and its derivatives on the Grassmann manifold
0061     tuple.U = grassmannfactory(m, p);
0062     tuple.V = grassmannfactory(n, p);
0063     % All of the code will work just as well if we ignore the invariance
0064     % property of the cost function indicated above and thus place U and V
0065     % on the Stiefel manifold (orthonormal matrices) instead of the
0066     % Grassmann manifold. Working on Stiefel is expected to be slower
0067     % though, partly because de search space is higher dimensional and
0068     % partly because the optimizers are not isolated.
0069     % tuple.U = stiefelfactory(m, p);
0070     % tuple.V = stiefelfactory(n, p);
0071     M = productmanifold(tuple);
0072     
0073     % Define a problem structure, specifying the manifold M, the cost
0074     % function and its derivatives. Here, to demonstrate the rapid
0075     % prototyping capabilities of Manopt, we directly define the Euclidean
0076     % gradient and the Euclidean Hessian egrad and ehess instead of the
0077     % Riemannian gradient and Hessian grad and hess. Manopt will take care
0078     % of the conversion. This automatic conversion is usually not
0079     % computationally optimal though, because much of the computations
0080     % involved in obtaining the gradient could be reused to obtain the
0081     % Hessian. After the prototyping stage, when efficiency becomes
0082     % important, it makes sense to define grad and hess rather than egrad
0083     % an ehess, and to use the caching system (the store structure).
0084     problem.M = M;
0085     problem.cost  = @cost;
0086     problem.egrad = @egrad;
0087     problem.ehess = @ehess;
0088     
0089     % The functions below make many redundant computations. This
0090     % performance hit can be alleviated by using the caching system.
0091     
0092     % Cost function
0093     function f = cost(X)
0094         U = X.U;
0095         V = X.V;
0096         f = -.5*norm(U'*A*V, 'fro')^2;
0097     end
0098     % Euclidean gradient of the cost function
0099     function g = egrad(X)
0100         U = X.U;
0101         V = X.V;
0102         AV = A*V;
0103         AtU = A'*U;
0104         g.U = -AV*(AV'*U);
0105         g.V = -AtU*(AtU'*V);
0106     end
0107     % Euclidean Hessian of the cost function
0108     function h = ehess(X, H)
0109         U = X.U;
0110         V = X.V;
0111         Udot = H.U;
0112         Vdot = H.V;
0113         AV = A*V;
0114         AtU = A'*U;
0115         AVdot = A*Vdot;
0116         AtUdot = A'*Udot;
0117         h.U = -(AVdot*AV'*U + AV*AVdot'*U + AV*AV'*Udot);
0118         h.V = -(AtUdot*AtU'*V + AtU*AtUdot'*V + AtU*AtU'*Vdot);
0119     end
0120 
0121     % An alternative way to compute the egrad and the ehess is to use
0122     % automatic differentiation provided in the deep learning toolbox
0123     % (slower). Notice that the function norm is not supported for AD so
0124     % far. Replace norm(...,'fro') with the backup function cnormsqfro
0125     % described in manoptADhelp
0126     % problem.cost = @cost_AD;
0127     %    function f = cost_AD(X)
0128     %        U = X.U;
0129     %        V = X.V;
0130     %        f = -.5*cnormsqfro(U'*A*V);
0131     %    end
0132     % call manoptAD to prepare AD for the problem structure
0133     % problem = manoptAD(problem);
0134     
0135     % Execute some checks on the derivatives for early debugging.
0136     % These things can be commented out of course.
0137     % checkgradient(problem);
0138     % pause;
0139     % checkhessian(problem);
0140     % pause;
0141     
0142     % Issue a call to a solver. A random initial guess will be chosen and
0143     % default options are selected. Here, we specify a maximum trust
0144     % region radius (which in turn induces an initial trust region radius).
0145     % Note that this is not required: default values are used if we omit
0146     % this. The diameter of the manifold scales like sqrt(2*p), hence the
0147     % form of our (empirical) choice.
0148     options.Delta_bar = 4*sqrt(2*p);
0149     [X, Xcost, info] = trustregions(problem, [], options); %#ok<ASGLU>
0150     U = X.U;
0151     V = X.V;
0152     
0153     % Finish the job by rotating U and V such that the middle matrix S can
0154     % be diagonal with nonnegative, decreasing entries. This requires a
0155     % small svd of size pxp.
0156     Spp = U'*A*V;
0157     [Upp, Spp, Vpp] = svd(Spp);
0158     U = U*Upp;
0159     S = Spp;
0160     V = V*Vpp;
0161     
0162     % For our information, Manopt can also compute the spectrum of the
0163     % Riemannian Hessian on the tangent space at (any) X. Computing the
0164     % spectrum at the solution gives us some idea of the conditioning of
0165     % the problem. If we were to implement a preconditioner for the
0166     % Hessian, this would also inform us on its performance.
0167     %
0168     % Notice that if the optimization is performed on a product of Stiefel
0169     % manifolds instead of a product of Grassmannians, the double
0170     % invariance under the orthogonal group O(p) will appear as twice
0171     % p*(p-1)/2, thus p*(p-1) zero eigenvalues in the spectrum of the
0172     % Hessian. This means that the minimizers are not isolated, which
0173     % typically hinders convergence of second order algorithms.
0174     if M.dim() < 512
0175         evs = hessianspectrum(problem, X);
0176         stairs(sort(evs));
0177         title(['Eigenvalues of the Hessian of the cost function ' ...
0178                'at the solution']);
0179     end
0180 
0181 end

truncated_svd

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SUBFUNCTIONS

SOURCE CODE