Manifold of n-by-n symmetric positive definite matrices with the bi-invariant geometry. function M = sympositivedefinitefactory(n) A point X on the manifold is represented as a symmetric positive definite matrix X (nxn). Tangent vectors are symmetric matrices of the same size (but not necessarily definite). The Riemannian metric is the bi-invariant metric, described notably in Chapter 6 of the 2007 book "Positive definite matrices" by Rajendra Bhatia, Princeton University Press. The retraction / exponential map involves expm (the matrix exponential). If too large a vector is retracted / exponentiated (e.g., a solver tries to make too big a step), this may result in NaN's in the returned point, which most likely would lead to NaN's in the cost / gradient / ... and will result in failure of the optimization. For trustregions, this can be controlled by setting options.Delta0 and options.Delta_bar, to prevent too large steps. Note also that many of the functions involve solving linear systems in X (a point on the manifold), taking matrix exponential and logarithms, etc. It could therefore be beneficial to do some precomputation on X (an eigenvalue decomposition for example) and store both X and the preprocessing in a structure. This would require modifying the present factory to work with such structures to represent both points and tangent vectors. We omit this in favor of simplicity, but it may be good to keep this in mind if efficiency becomes an issue in your application.
0001 function M = sympositivedefinitefactory(n) 0002 % Manifold of n-by-n symmetric positive definite matrices with 0003 % the bi-invariant geometry. 0004 % 0005 % function M = sympositivedefinitefactory(n) 0006 % 0007 % A point X on the manifold is represented as a symmetric positive definite 0008 % matrix X (nxn). Tangent vectors are symmetric matrices of the same size 0009 % (but not necessarily definite). 0010 % 0011 % The Riemannian metric is the bi-invariant metric, described notably in 0012 % Chapter 6 of the 2007 book "Positive definite matrices" 0013 % by Rajendra Bhatia, Princeton University Press. 0014 % 0015 % 0016 % The retraction / exponential map involves expm (the matrix exponential). 0017 % If too large a vector is retracted / exponentiated (e.g., a solver tries 0018 % to make too big a step), this may result in NaN's in the returned point, 0019 % which most likely would lead to NaN's in the cost / gradient / ... and 0020 % will result in failure of the optimization. For trustregions, this can be 0021 % controlled by setting options.Delta0 and options.Delta_bar, to prevent 0022 % too large steps. 0023 % 0024 % 0025 % Note also that many of the functions involve solving linear systems in X 0026 % (a point on the manifold), taking matrix exponential and logarithms, etc. 0027 % It could therefore be beneficial to do some precomputation on X (an 0028 % eigenvalue decomposition for example) and store both X and the 0029 % preprocessing in a structure. This would require modifying the present 0030 % factory to work with such structures to represent both points and tangent 0031 % vectors. We omit this in favor of simplicity, but it may be good to keep 0032 % this in mind if efficiency becomes an issue in your application. 0033 0034 % This file is part of Manopt: www.manopt.org. 0035 % Original author: Bamdev Mishra, August 29, 2013. 0036 % Contributors: Nicolas Boumal 0037 % Change log: 0038 % 0039 % March 5, 2014 (NB) 0040 % There were a number of mistakes in the code owing to the tacit 0041 % assumption that if X and eta are symmetric, then X\eta is 0042 % symmetric too, which is not the case. See discussion on the Manopt 0043 % forum started on Jan. 19, 2014. Functions norm, dist, exp and log 0044 % were modified accordingly. Furthermore, they only require matrix 0045 % inversion (as well as matrix log or matrix exp), not matrix square 0046 % roots or their inverse. 0047 % 0048 % July 28, 2014 (NB) 0049 % The dim() function returned n*(n-1)/2 instead of n*(n+1)/2. 0050 % Implemented proper parallel transport from Sra and Hosseini (not 0051 % used by default). 0052 % Also added symmetrization in exp and log (to be sure). 0053 % 0054 % April 3, 2015 (NB): 0055 % Replaced trace(A*B) by a faster equivalent that does not compute 0056 % the whole product A*B, for inner product, norm and distance. 0057 % 0058 % May 23, 2017 (NB): 0059 % As seen in a talk of Wen Huang at the SIAM Optimization Conference 0060 % today, replaced the retraction of this factory (which was simply 0061 % equal to the exponential map) with a simpler, second-order 0062 % retraction. That this retraction is second order can be verified 0063 % numerically with checkretraction(sympositivedefinitefactory(5)); 0064 % Notice that, for this retraction, it would be cheap to evaluate for 0065 % many values of t, that is, it is cheap to retract many points along 0066 % the same tangent direction. This could in principle be exploited to 0067 % speed up line-searches. 0068 0069 symm = @(X) .5*(X+X'); 0070 0071 M.name = @() sprintf('Symmetric positive definite geometry of %dx%d matrices', n, n); 0072 0073 M.dim = @() n*(n+1)/2; 0074 0075 % Helpers to avoid computing full matrices simply to extract their trace 0076 vec = @(A) A(:); 0077 trAB = @(A, B) vec(A')'*vec(B); % = trace(A*B) 0078 trAA = @(A) sqrt(trAB(A, A)); % = sqrt(trace(A^2)) 0079 0080 % Choice of the metric on the orthonormal space is motivated by the 0081 % symmetry present in the space. The metric on the positive definite 0082 % cone is its natural bi-invariant metric. 0083 % The result is equal to: trace( (X\eta) * (X\zeta) ) 0084 M.inner = @(X, eta, zeta) trAB(X\eta, X\zeta); 0085 0086 % Notice that X\eta is *not* symmetric in general. 0087 % The result is equal to: sqrt(trace((X\eta)^2)). 0088 % Thus, we compute the sum of the squared eigenvalues of X\eta, which 0089 % in general is different from summing the squared singular values: 0090 % that is why it is not equivalent to compute the Frobenius norm here. 0091 % There should be no need to take the real part, but rounding errors 0092 % may cause a small imaginary part to appear, so we discard it. 0093 M.norm = @(X, eta) real(trAA(X\eta)); 0094 0095 % Same here: X\Y is not symmetric in general. 0096 % Same remark about taking the real part. 0097 M.dist = @(X, Y) real(trAA(real(logm(X\Y)))); 0098 0099 0100 M.typicaldist = @() sqrt(n*(n+1)/2); 0101 0102 0103 M.egrad2rgrad = @egrad2rgrad; 0104 function eta = egrad2rgrad(X, eta) 0105 eta = X*symm(eta)*X; 0106 end 0107 0108 0109 M.ehess2rhess = @ehess2rhess; 0110 function Hess = ehess2rhess(X, egrad, ehess, eta) 0111 % Directional derivatives of the Riemannian gradient 0112 Hess = X*symm(ehess)*X + 2*symm(eta*symm(egrad)*X); 0113 0114 % Correction factor for the non-constant metric 0115 Hess = Hess - symm(eta*symm(egrad)*X); 0116 end 0117 0118 0119 M.proj = @(X, eta) symm(eta); 0120 0121 M.tangent = M.proj; 0122 M.tangent2ambient = @(X, eta) eta; 0123 0124 M.retr = @retraction; 0125 function Y = retraction(X, eta, t) 0126 if nargin < 3 0127 teta = eta; 0128 else 0129 teta = t*eta; 0130 end 0131 % The symm() call is mathematically unnecessary but numerically 0132 % necessary. 0133 Y = symm(X + teta + .5*teta*(X\teta)); 0134 end 0135 0136 M.exp = @exponential; 0137 function Y = exponential(X, eta, t) 0138 if nargin < 3 0139 t = 1.0; 0140 end 0141 % The symm() and real() calls are mathematically not necessary but 0142 % are numerically necessary. 0143 Y = symm(X*real(expm(X\(t*eta)))); 0144 end 0145 0146 M.log = @logarithm; 0147 function H = logarithm(X, Y) 0148 % Same remark regarding the calls to symm() and real(). 0149 H = symm(X*real(logm(X\Y))); 0150 end 0151 0152 M.hash = @(X) ['z' hashmd5(X(:))]; 0153 0154 % Generate a random symmetric positive definite matrix following a 0155 % certain distribution. The particular choice of a distribution is of 0156 % course arbitrary, and specific applications might require different 0157 % ones. 0158 M.rand = @random; 0159 function X = random() 0160 D = diag(1+rand(n, 1)); 0161 [Q, R] = qr(randn(n)); %#ok 0162 X = Q*D*Q'; 0163 end 0164 0165 % Generate a uniformly random unit-norm tangent vector at X. 0166 M.randvec = @randomvec; 0167 function eta = randomvec(X) 0168 eta = symm(randn(n)); 0169 nrm = M.norm(X, eta); 0170 eta = eta / nrm; 0171 end 0172 0173 M.lincomb = @matrixlincomb; 0174 0175 M.zerovec = @(X) zeros(n); 0176 0177 % Poor man's vector transport: exploit the fact that all tangent spaces 0178 % are the set of symmetric matrices, so that the identity is a sort of 0179 % vector transport. It may perform poorly if the origin and target (X1 0180 % and X2) are far apart though. This should not be the case for typical 0181 % optimization algorithms, which perform small steps. 0182 M.transp = @(X1, X2, eta) eta; 0183 0184 % For reference, a proper vector transport is given here, following 0185 % work by Sra and Hosseini: "Conic geometric optimisation on the 0186 % manifold of positive definite matrices", in SIAM J. Optim. 0187 % in 2015; also available here: http://arxiv.org/abs/1312.1039 0188 % This will not be used by default. To force the use of this transport, 0189 % execute "M.transp = M.paralleltransp;" on your M returned by the 0190 % present factory. 0191 M.paralleltransp = @parallel_transport; 0192 function zeta = parallel_transport(X, Y, eta) 0193 E = sqrtm(Y/X); 0194 zeta = E*eta*E'; 0195 end 0196 0197 % vec and mat are not isometries, because of the unusual inner metric. 0198 M.vec = @(X, U) U(:); 0199 M.mat = @(X, u) reshape(u, n, n); 0200 M.vecmatareisometries = @() false; 0201 0202 end