Computes an intrinsic mean of a collection of positive definite matrices. function X = positive_definite_intrinsic_mean(A) Input: A 3D matrix A of size nxnxm such that each slice A(:, :, k) is a positive definite matrix of size nxn. Output: A positive definite matrix X of size nxn which is an intrinsic mean of the m matrices in A, that is, X minimizes the sum of squared Riemannian distances to the matrices in A: f(X) = sum_k=1^m .5*dist^2(X, A(:, :, k)) The distance is defined by the natural metric on the set of positive definite matrices: see sympositivedefinitefactory. This simple example is not the best way to compute intrinsic means. Its purpose it to serve as base code to explore other algorithms. In particular, in the presence of large noise, this algorithm seems not to be able to reach points with a very small gradient norm. This may be caused by insufficient accuracy in the gradient computation. See also: sympositivedefinitefactory
0001 function X = positive_definite_intrinsic_mean(A) 0002 % Computes an intrinsic mean of a collection of positive definite matrices. 0003 % 0004 % function X = positive_definite_intrinsic_mean(A) 0005 % 0006 % Input: A 3D matrix A of size nxnxm such that each slice A(:, :, k) is a 0007 % positive definite matrix of size nxn. 0008 % 0009 % Output: A positive definite matrix X of size nxn which is an intrinsic mean 0010 % of the m matrices in A, that is, X minimizes the sum of squared 0011 % Riemannian distances to the matrices in A: 0012 % f(X) = sum_k=1^m .5*dist^2(X, A(:, :, k)) 0013 % The distance is defined by the natural metric on the set of 0014 % positive definite matrices: see sympositivedefinitefactory. 0015 % 0016 % This simple example is not the best way to compute intrinsic means. Its 0017 % purpose it to serve as base code to explore other algorithms. In 0018 % particular, in the presence of large noise, this algorithm seems not to 0019 % be able to reach points with a very small gradient norm. This may be 0020 % caused by insufficient accuracy in the gradient computation. 0021 % 0022 % See also: sympositivedefinitefactory 0023 0024 % This file is part of Manopt and is copyrighted. See the license file. 0025 % 0026 % Main author: Nicolas Boumal, Sept. 3, 2013 0027 % Contributors: 0028 % 0029 % Change log: 0030 % Sep. 15, 2022 (NB): 0031 % Changed name from positive_definite_karcher_mean, clarified 0032 % some comments. 0033 0034 % Generate some random data to test the function if none is given. 0035 if ~exist('A', 'var') || isempty(A) 0036 n = 5; 0037 m = 50; 0038 A = zeros(n, n, m); 0039 ref = diag(max(.1, 1+.1*randn(n, 1))); 0040 for i = 1 : m 0041 noise = 0.01*randn(n); 0042 noise = (noise + noise')/2; 0043 [V, D] = eig(ref + noise); 0044 A(:, :, i) = V*diag(max(.01, diag(D)))*V'; 0045 end 0046 end 0047 0048 % Retrieve the size of the problem: 0049 % There are m matrices of size nxn to average. 0050 n = size(A, 1); 0051 m = size(A, 3); 0052 assert(n == size(A, 2), ... 0053 ['The slices of A must be square, i.e., the ' ... 0054 'first and second dimensions of A must be equal.']); 0055 0056 % Our search space is the set of positive definite matrices of size n. 0057 % Notice that this is the only place we specify on which manifold we 0058 % wish to compute Karcher means. Replacing this factory for another 0059 % geometry will yield code to compute Karcher means on that other 0060 % manifold, provided that manifold is equipped with a dist function and 0061 % a logarithmic map log. 0062 M = sympositivedefinitefactory(n); 0063 0064 % Define a problem structure, specifying the manifold M, the cost 0065 % function and its gradient. 0066 problem.M = M; 0067 problem.cost = @cost; 0068 problem.grad = @grad; 0069 0070 % Explicitly pick an approximate Hessian for the trust-region method. 0071 % (This is only to show an example of how it can be done; the solver 0072 % below, rlbfgs, does not use the approximate Hessian; trustregions 0073 % would, but it would figure it out automatically with default 0074 % stepsize if the line below is omitted.) 0075 problem.approxhess = approxhessianFD(problem, struct('stepsize', 1e-4)); 0076 0077 % The functions below make many redundant computations. This 0078 % performance hit can be alleviated by using the caching system. We go 0079 % for a simple implementation here, as a tutorial example. 0080 0081 % Cost function 0082 function f = cost(X) 0083 f = 0; 0084 for k = 1 : m 0085 f = f + M.dist(X, A(:, :, k))^2; 0086 end 0087 f = f/(2*m); 0088 end 0089 0090 % Riemannian gradient of the cost function 0091 function g = grad(X) 0092 g = M.zerovec(X); 0093 for k = 1 : m 0094 % Update g in a linear combination of the form 0095 % g = g - [something]/m. 0096 g = M.lincomb(X, 1, g, -1/m, M.log(X, A(:, :, k))); 0097 end 0098 end 0099 0100 % Execute some checks on the derivatives for early debugging. 0101 % These things can be commented out of course. 0102 % The slopes should agree on part of the plot at least. In this case, 0103 % it is sometimes necessary to inspect the plot visually to make the 0104 % call, but it is indeed correct. 0105 % checkgradient(problem); 0106 % pause; 0107 0108 % Execute this if you want to force using a proper parallel vector 0109 % transport. This is not necessary. If you omit this, the default 0110 % vector transport is the identity map, which is (of course) cheaper 0111 % and seems to perform well in practice. 0112 % M.transp = M.paralleltransp; 0113 0114 % Issue a call to a solver. Default options are selected. 0115 % Our initial guess is the first data point. Most solvers work well 0116 % with this problem. Limited-memory BFGS is one good example: 0117 X = rlbfgs(problem, A(:, :, 1)); 0118 0119 end