Computes a Karcher mean of a collection of positive definite matrices. function X = positive_definite_karcher_mean(A) Input: A 3D matrix A of size nxnxm such that each slice A(:,:,k) is a positive definite matrix of size nxn. Output: A positive definite matrix X of size nxn which is a Karcher mean of the m matrices in A, that is, X minimizes the sum of squared Riemannian distances to the matrices in A: f(X) = sum_k=1^m .5*dist^2(X, A(:, :, k)) The distance is defined by the natural metric on the set of positive definite matrices: dist(X,Y) = norm(logm(X\Y), 'fro'). This simple example is not the best way to compute Karcher means. Its purpose it to serve as base code to explore other algorithms. In particular, in the presence of large noise, this algorithm seems to not be able to reach points with a very small gradient norm. This may be caused by insufficient accuracy in the gradient computation.

- sympositivedefinitefactory Manifold of n-by-n symmetric positive definite matrices with
- rlbfgs Riemannian limited memory BFGS solver for smooth objective functions.
- approxhessianFD Hessian approx. fnctn handle based on finite differences of the gradient.

0001 function X = positive_definite_karcher_mean(A) 0002 % Computes a Karcher mean of a collection of positive definite matrices. 0003 % 0004 % function X = positive_definite_karcher_mean(A) 0005 % 0006 % Input: A 3D matrix A of size nxnxm such that each slice A(:,:,k) is a 0007 % positive definite matrix of size nxn. 0008 % 0009 % Output: A positive definite matrix X of size nxn which is a Karcher mean 0010 % of the m matrices in A, that is, X minimizes the sum of squared 0011 % Riemannian distances to the matrices in A: 0012 % f(X) = sum_k=1^m .5*dist^2(X, A(:, :, k)) 0013 % The distance is defined by the natural metric on the set of 0014 % positive definite matrices: dist(X,Y) = norm(logm(X\Y), 'fro'). 0015 % 0016 % This simple example is not the best way to compute Karcher means. Its 0017 % purpose it to serve as base code to explore other algorithms. In 0018 % particular, in the presence of large noise, this algorithm seems to not 0019 % be able to reach points with a very small gradient norm. This may be 0020 % caused by insufficient accuracy in the gradient computation. 0021 0022 % This file is part of Manopt and is copyrighted. See the license file. 0023 % 0024 % Main author: Nicolas Boumal, Sept. 3, 2013 0025 % Contributors: 0026 % 0027 % Change log: 0028 % 0029 0030 % Generate some random data to test the function if none is given. 0031 if ~exist('A', 'var') || isempty(A) 0032 n = 5; 0033 m = 50; 0034 A = zeros(n, n, m); 0035 ref = diag(max(.1, 1+.1*randn(n, 1))); 0036 for i = 1 : m 0037 noise = 0.01*randn(n); 0038 noise = (noise + noise')/2; 0039 [V, D] = eig(ref + noise); 0040 A(:, :, i) = V*diag(max(.01, diag(D)))*V'; 0041 end 0042 end 0043 0044 % Retrieve the size of the problem: 0045 % There are m matrices of size nxn to average. 0046 n = size(A, 1); 0047 m = size(A, 3); 0048 assert(n == size(A, 2), ... 0049 ['The slices of A must be square, i.e., the ' ... 0050 'first and second dimensions of A must be equal.']); 0051 0052 % Our search space is the set of positive definite matrices of size n. 0053 % Notice that this is the only place we specify on which manifold we 0054 % wish to compute Karcher means. Replacing this factory for another 0055 % geometry will yield code to compute Karcher means on that other 0056 % manifold, provided that manifold is equipped with a dist function and 0057 % a logarithmic map log. 0058 M = sympositivedefinitefactory(n); 0059 0060 % Define a problem structure, specifying the manifold M, the cost 0061 % function and its gradient. 0062 problem.M = M; 0063 problem.cost = @cost; 0064 problem.grad = @grad; 0065 0066 % Explicitly pick an approximate Hessian for the trust-region method. 0067 % (This is only to show an example of how it can be done; the solver 0068 % below, rlbfgs, does not use the approximate Hessian; trustregions 0069 % would.) 0070 problem.approxhess = approxhessianFD(problem, struct('stepsize', 1e-4)); 0071 0072 % The functions below make many redundant computations. This 0073 % performance hit can be alleviated by using the caching system. We go 0074 % for a simple implementation here, as a tutorial example. 0075 0076 % Cost function 0077 function f = cost(X) 0078 f = 0; 0079 for k = 1 : m 0080 f = f + M.dist(X, A(:, :, k))^2; 0081 end 0082 f = f/(2*m); 0083 end 0084 0085 % Riemannian gradient of the cost function 0086 function g = grad(X) 0087 g = M.zerovec(X); 0088 for k = 1 : m 0089 % Update g in a linear combination of the form 0090 % g = g - [something]/m. 0091 g = M.lincomb(X, 1, g, -1/m, M.log(X, A(:, :, k))); 0092 end 0093 end 0094 0095 % Execute some checks on the derivatives for early debugging. 0096 % These things can be commented out of course. 0097 % The slopes should agree on part of the plot at least. In this case, 0098 % it is sometimes necessary to inspect the plot visually to make the 0099 % call, but it is indeed correct. 0100 % checkgradient(problem); 0101 % pause; 0102 0103 % Execute this if you want to force using a proper parallel vector 0104 % transport. This is not necessary. If you omit this, the default 0105 % vector transport is the identity map, which is (of course) cheaper 0106 % and seems to perform well in practice. 0107 % M.transp = M.paralleltransp; 0108 0109 % Issue a call to a solver. Default options are selected. 0110 % Our initial guess is the first data point. Most solvers work well 0111 % with this problem. Limited-memory BFGS is one good example: 0112 X = rlbfgs(problem, A(:, :, 1)); 0113 0114 end

Generated on Mon 10-Sep-2018 11:48:06 by