Manifold of n-by-n symmetric positive definite matrices with the bi-invariant geometry. function M = sympositivedefinitefactory(n) A point X on the manifold is represented as a symmetric positive definite matrix X (nxn). Tangent vectors are symmetric matrices of the same size (but not necessarily definite). The Riemannian metric is the bi-invariant metric, described notably in Chapter 6 of the 2007 book "Positive definite matrices" by Rajendra Bhatia, Princeton University Press. The retraction / exponential map involves expm (the matrix exponential). If too large a vector is retracted / exponentiated (e.g., a solver tries to make too big a step), this may result in NaN's in the returned point, which most likely would lead to NaN's in the cost / gradient / ... and will result in failure of the optimization. For trustregions, this can be controlled by setting options.Delta0 and options.Delta_bar, to prevent too large steps. Note also that many of the functions involve solving linear systems in X (a point on the manifold), taking matrix exponential and logarithms, etc. It could therefore be beneficial to do some precomputation on X (an eigenvalue decomposition for example) and store both X and the preprocessing in a structure. This would require modifying the present factory to work with such structures to represent both points and tangent vectors. We omit this in favor of simplicity, but it may be good to keep this in mind if efficiency becomes an issue in your application.
0001 function M = sympositivedefinitefactory(n) 0002 % Manifold of n-by-n symmetric positive definite matrices with 0003 % the bi-invariant geometry. 0004 % 0005 % function M = sympositivedefinitefactory(n) 0006 % 0007 % A point X on the manifold is represented as a symmetric positive definite 0008 % matrix X (nxn). Tangent vectors are symmetric matrices of the same size 0009 % (but not necessarily definite). 0010 % 0011 % The Riemannian metric is the bi-invariant metric, described notably in 0012 % Chapter 6 of the 2007 book "Positive definite matrices" 0013 % by Rajendra Bhatia, Princeton University Press. 0014 % 0015 % 0016 % The retraction / exponential map involves expm (the matrix exponential). 0017 % If too large a vector is retracted / exponentiated (e.g., a solver tries 0018 % to make too big a step), this may result in NaN's in the returned point, 0019 % which most likely would lead to NaN's in the cost / gradient / ... and 0020 % will result in failure of the optimization. For trustregions, this can be 0021 % controlled by setting options.Delta0 and options.Delta_bar, to prevent 0022 % too large steps. 0023 % 0024 % 0025 % Note also that many of the functions involve solving linear systems in X 0026 % (a point on the manifold), taking matrix exponential and logarithms, etc. 0027 % It could therefore be beneficial to do some precomputation on X (an 0028 % eigenvalue decomposition for example) and store both X and the 0029 % preprocessing in a structure. This would require modifying the present 0030 % factory to work with such structures to represent both points and tangent 0031 % vectors. We omit this in favor of simplicity, but it may be good to keep 0032 % this in mind if efficiency becomes an issue in your application. 0033 0034 % This file is part of Manopt: www.manopt.org. 0035 % Original author: Bamdev Mishra, August 29, 2013. 0036 % Contributors: Nicolas Boumal 0037 % Change log: 0038 % 0039 % March 5, 2014 (NB) 0040 % There were a number of mistakes in the code owing to the tacit 0041 % assumption that if X and eta are symmetric, then X\eta is 0042 % symmetric too, which is not the case. See discussion on the Manopt 0043 % forum started on Jan. 19, 2014. Functions norm, dist, exp and log 0044 % were modified accordingly. Furthermore, they only require matrix 0045 % inversion (as well as matrix log or matrix exp), not matrix square 0046 % roots or their inverse. 0047 % 0048 % July 28, 2014 (NB) 0049 % The dim() function returned n*(n-1)/2 instead of n*(n+1)/2. 0050 % Implemented proper parallel transport from Sra and Hosseini (not 0051 % used by default). 0052 % Also added symmetrization in exp and log (to be sure). 0053 % 0054 % April 3, 2015 (NB): 0055 % Replaced trace(A*B) by a faster equivalent that does not compute 0056 % the whole product A*B, for inner product, norm and distance. 0057 % 0058 % May 23, 2017 (NB): 0059 % As seen in a talk of Wen Huang at the SIAM Optimization Conference 0060 % today, replaced the retraction of this factory (which was simply 0061 % equal to the exponential map) with a simpler, second-order 0062 % retraction. That this retraction is second order can be verified 0063 % numerically with checkretraction(sympositivedefinitefactory(5)); 0064 % Notice that, for this retraction, it would be cheap to evaluate for 0065 % many values of t, that is, it is cheap to retract many points along 0066 % the same tangent direction. This could in principle be exploited to 0067 % speed up line-searches. 0068 % 0069 % Jan. 12, 2022 (NB): 0070 % Simplified code for ehess2rhess by commenting out the reasoning and 0071 % computing only the end result. 0072 0073 symm = @(X) .5*(X+X'); 0074 0075 M.name = @() sprintf('Symmetric positive definite geometry of %dx%d matrices', n, n); 0076 0077 M.dim = @() n*(n+1)/2; 0078 0079 % Helpers to avoid computing full matrices simply to extract their trace 0080 vec = @(A) A(:); 0081 trAB = @(A, B) vec(A')'*vec(B); % = trace(A*B) 0082 trAA = @(A) sqrt(trAB(A, A)); % = sqrt(trace(A^2)) 0083 0084 % Choice of the metric on the orthonormal space is motivated by the 0085 % symmetry present in the space. The metric on the positive definite 0086 % cone is its natural bi-invariant metric. 0087 % The result is equal to: trace( (X\eta) * (X\zeta) ) 0088 M.inner = @(X, eta, zeta) trAB(X\eta, X\zeta); 0089 0090 % Notice that X\eta is *not* symmetric in general. 0091 % The result is equal to: sqrt(trace((X\eta)^2)). 0092 % Thus, we compute the sum of the squared eigenvalues of X\eta, which 0093 % in general is different from summing the squared singular values: 0094 % that is why it is not equivalent to compute the Frobenius norm here. 0095 % There should be no need to take the real part, but rounding errors 0096 % may cause a small imaginary part to appear, so we discard it. 0097 M.norm = @(X, eta) real(trAA(X\eta)); 0098 0099 % Same here: X\Y is not symmetric in general. 0100 % Same remark about taking the real part. 0101 M.dist = @(X, Y) real(trAA(real(logm(X\Y)))); 0102 0103 0104 M.typicaldist = @() sqrt(n*(n+1)/2); 0105 0106 0107 M.egrad2rgrad = @egrad2rgrad; 0108 function eta = egrad2rgrad(X, eta) 0109 eta = X*symm(eta)*X; 0110 end 0111 0112 0113 M.ehess2rhess = @ehess2rhess; 0114 function Hess = ehess2rhess(X, egrad, ehess, eta) 0115 % Directional derivatives of the Riemannian gradient 0116 % Hess = X*symm(ehess)*X + 2*symm(eta*symm(egrad)*X); 0117 % Correction factor for the non-constant metric 0118 % Hess = Hess - symm(eta*symm(egrad)*X); 0119 % Combined: 0120 Hess = X*symm(ehess)*X + symm(eta*symm(egrad)*X); 0121 end 0122 0123 0124 M.proj = @(X, eta) symm(eta); 0125 0126 M.tangent = M.proj; 0127 M.tangent2ambient = @(X, eta) eta; 0128 0129 M.retr = @retraction; 0130 function Y = retraction(X, eta, t) 0131 if nargin < 3 0132 teta = eta; 0133 else 0134 teta = t*eta; 0135 end 0136 % The symm() call is mathematically unnecessary but numerically 0137 % necessary. 0138 Y = symm(X + teta + .5*teta*(X\teta)); 0139 end 0140 0141 M.exp = @exponential; 0142 function Y = exponential(X, eta, t) 0143 if nargin < 3 0144 t = 1.0; 0145 end 0146 % The symm() and real() calls are mathematically not necessary but 0147 % are numerically necessary. 0148 Y = symm(X*real(expm(X\(t*eta)))); 0149 end 0150 0151 M.log = @logarithm; 0152 function H = logarithm(X, Y) 0153 % Same remark regarding the calls to symm() and real(). 0154 H = symm(X*real(logm(X\Y))); 0155 end 0156 0157 M.hash = @(X) ['z' hashmd5(X(:))]; 0158 0159 % Generate a random symmetric positive definite matrix following a 0160 % certain distribution. The particular choice of a distribution is of 0161 % course arbitrary, and specific applications might require different 0162 % ones. 0163 M.rand = @random; 0164 function X = random() 0165 D = diag(1+rand(n, 1)); 0166 [Q, R] = qr(randn(n)); %#ok 0167 X = Q*D*Q'; 0168 end 0169 0170 % Generate a uniformly random unit-norm tangent vector at X. 0171 M.randvec = @randomvec; 0172 function eta = randomvec(X) 0173 eta = symm(randn(n)); 0174 nrm = M.norm(X, eta); 0175 eta = eta / nrm; 0176 end 0177 0178 M.lincomb = @matrixlincomb; 0179 0180 M.zerovec = @(X) zeros(n); 0181 0182 % Poor man's vector transport: exploit the fact that all tangent spaces 0183 % are the set of symmetric matrices, so that the identity is a sort of 0184 % vector transport. It may perform poorly if the origin and target (X1 0185 % and X2) are far apart though. This should not be the case for typical 0186 % optimization algorithms, which perform small steps. 0187 M.transp = @(X1, X2, eta) eta; 0188 0189 % For reference, a proper vector transport is given here, following 0190 % work by Sra and Hosseini: "Conic geometric optimisation on the 0191 % manifold of positive definite matrices", in SIAM J. Optim. 0192 % in 2015; also available here: http://arxiv.org/abs/1312.1039 0193 % This will not be used by default. To force the use of this transport, 0194 % execute "M.transp = M.paralleltransp;" on your M returned by the 0195 % present factory. 0196 M.paralleltransp = @parallel_transport; 0197 function zeta = parallel_transport(X, Y, eta) 0198 E = sqrtm(Y/X); 0199 zeta = E*eta*E'; 0200 end 0201 0202 % vec and mat are not isometries, because of the unusual inner metric. 0203 M.vec = @(X, U) U(:); 0204 M.mat = @(X, u) reshape(u, n, n); 0205 M.vecmatareisometries = @() false; 0206 0207 end