Factory for matrices whose columns live on the hyperbolic manifold function M = hyperbolicfactory(n) function M = hyperbolicfactory(n, m) function M = hyperbolicfactory(n, m, transposed) Returns a structure M which describes the hyperbolic manifold in Manopt. A point on the manifold is a matrix X of size (n+1)-by-m whose columns live on the hyperbolic manifold, that is, for each column x of X, we have -x(1)^2 + x(2)^2 + x(3)^2 + ... + x(n+1)^2 = -1. The positive branch is selected by M.rand(), that is, x(1) > 0, but all tools work on the negative branch as well. Equivalently, defining the Minkowski (semi) inner product <x, y> = -x(1)y(1) + x(2)y(2) + x(3)y(3) + ... + x(n+1)y(n+1) and the induced Minkowski (semi) norm ||x||^2 = <x, x>, we can write compactly that each column of X has squared Minkowski norm equal to -1. The set of matrices X that satisfy this constraint is a smooth manifold. Tangent vectors at X are matrices U of the same size as X. If x and u are the kth columns of X and U respectively, then <x, u> = 0. This manifold is turned into a Riemannian manifold by restricting the Minkowski inner product to each tangent space (a simple calculation confirms that this metric is indeed Riemannian and not just semi Riemannian, that is, it is positive definite when restricted to each tangent space). This is the hyperbolic manifold: for m = 1, all of its sectional curvatures are equal to -1. This is called the hyperboloid or the Lorentz geometry. This manifold is an embedded submanifold of Euclidean space (the set of matrices of size (n+1)-by-m equipped with the usual trace inner product). Thus, when defining the Euclidean gradient for example (problem.egrad), it should be specified as if the function were defined in Euclidean space directly. The tool M.egrad2rgrad will automatically convert that gradient to the correct Riemannian gradient, as needed to satisfy the metric. The same is true for the Euclidean Hessian and other tools that manipulate elements in the embedding space. Importantly, the resulting manifold is /not/ a Riemannian submanifold of Euclidean space, because its metric is not obtained simply by restricting the Euclidean metric to the tangent spaces. However, it is a semi-Riemannian submanifold of Minkowski space, that is, the set of matrices of size (n+1)-by-m equipped with the Minkowski inner product. Minkowski space itself can be seen as a (linear) semi-Riemannian manifold embedded in Euclidean space. This view is entirely equivalent to the one described above (the Riemannian structure of the resulting manifold is exactly the same), and it is useful to derive some of the tools this factory provides. If transposed is set to true (it is false by default), then the matrices are transposed: a point X on the manifold is a matrix of size m-by-(n+1) and each row is an element in hyperbolic space. It is the same geometry, just a different representation. Resources: 1. Nickel and Kiela, "Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry", ICML, 2018. 2. Wilson and Leimeister, "Gradient descent in hyperbolic space", arXiv preprint arXiv:1805.08207 (2018). 3. Pennec, "Hessian of the Riemannian squared distance", HAL INRIA, 2017. Ported primarily from the McTorch toolbox at https://github.com/mctorch/mctorch. See also: poincareballfactory spherefactory obliquefactory obliquecomplexfactory
0001 function M = hyperbolicfactory(n, m, transposed) 0002 % Factory for matrices whose columns live on the hyperbolic manifold 0003 % 0004 % function M = hyperbolicfactory(n) 0005 % function M = hyperbolicfactory(n, m) 0006 % function M = hyperbolicfactory(n, m, transposed) 0007 % 0008 % Returns a structure M which describes the hyperbolic manifold in Manopt. 0009 % A point on the manifold is a matrix X of size (n+1)-by-m whose columns 0010 % live on the hyperbolic manifold, that is, for each column x of X, we have 0011 % 0012 % -x(1)^2 + x(2)^2 + x(3)^2 + ... + x(n+1)^2 = -1. 0013 % 0014 % The positive branch is selected by M.rand(), that is, x(1) > 0, but all 0015 % tools work on the negative branch as well. 0016 % 0017 % Equivalently, defining the Minkowski (semi) inner product 0018 % 0019 % <x, y> = -x(1)y(1) + x(2)y(2) + x(3)y(3) + ... + x(n+1)y(n+1) 0020 % 0021 % and the induced Minkowski (semi) norm ||x||^2 = <x, x>, we can write 0022 % compactly that each column of X has squared Minkowski norm equal to -1. 0023 % 0024 % The set of matrices X that satisfy this constraint is a smooth manifold. 0025 % Tangent vectors at X are matrices U of the same size as X. If x and u are 0026 % the kth columns of X and U respectively, then <x, u> = 0. 0027 % 0028 % This manifold is turned into a Riemannian manifold by restricting the 0029 % Minkowski inner product to each tangent space (a simple calculation 0030 % confirms that this metric is indeed Riemannian and not just semi 0031 % Riemannian, that is, it is positive definite when restricted to each 0032 % tangent space). This is the hyperbolic manifold: for m = 1, all of its 0033 % sectional curvatures are equal to -1. This is called the hyperboloid or 0034 % the Lorentz geometry. 0035 % 0036 % This manifold is an embedded submanifold of Euclidean space (the set of 0037 % matrices of size (n+1)-by-m equipped with the usual trace inner product). 0038 % Thus, when defining the Euclidean gradient for example (problem.egrad), 0039 % it should be specified as if the function were defined in Euclidean space 0040 % directly. The tool M.egrad2rgrad will automatically convert that gradient 0041 % to the correct Riemannian gradient, as needed to satisfy the metric. The 0042 % same is true for the Euclidean Hessian and other tools that manipulate 0043 % elements in the embedding space. 0044 % 0045 % Importantly, the resulting manifold is /not/ a Riemannian submanifold of 0046 % Euclidean space, because its metric is not obtained simply by restricting 0047 % the Euclidean metric to the tangent spaces. However, it is a 0048 % semi-Riemannian submanifold of Minkowski space, that is, the set of 0049 % matrices of size (n+1)-by-m equipped with the Minkowski inner product. 0050 % Minkowski space itself can be seen as a (linear) semi-Riemannian manifold 0051 % embedded in Euclidean space. This view is entirely equivalent to the one 0052 % described above (the Riemannian structure of the resulting manifold is 0053 % exactly the same), and it is useful to derive some of the tools this 0054 % factory provides. 0055 % 0056 % If transposed is set to true (it is false by default), then the matrices 0057 % are transposed: a point X on the manifold is a matrix of size m-by-(n+1) 0058 % and each row is an element in hyperbolic space. It is the same geometry, 0059 % just a different representation. 0060 % 0061 % 0062 % Resources: 0063 % 0064 % 1. Nickel and Kiela, "Learning Continuous Hierarchies in the Lorentz 0065 % Model of Hyperbolic Geometry", ICML, 2018. 0066 % 0067 % 2. Wilson and Leimeister, "Gradient descent in hyperbolic space", 0068 % arXiv preprint arXiv:1805.08207 (2018). 0069 % 0070 % 3. Pennec, "Hessian of the Riemannian squared distance", HAL INRIA, 2017. 0071 % 0072 % Ported primarily from the McTorch toolbox at 0073 % https://github.com/mctorch/mctorch. 0074 % 0075 % See also: poincareballfactory spherefactory obliquefactory obliquecomplexfactory 0076 0077 0078 % This file is part of Manopt: www.manopt.org. 0079 % Original authors: Bamdev Mishra <bamdevm@gmail.com>, Mayank Meghwanshi, 0080 % Pratik Jawanpuria, Anoop Kunchukuttan, and Hiroyuki Kasai Oct 28, 2018. 0081 % Contributors: Nicolas Boumal 0082 % Change log: 0083 % May 14, 2020 (NB): 0084 % Clarified comments about distance computation. 0085 % July 13, 2020 (NB): 0086 % Added pairmean function. 0087 0088 % Design note: all functions that are defined here but not exposed 0089 % outside work for non-transposed representations. Only the wrappers 0090 % that eventually expose functionalities handle transposition. This 0091 % makes it easier to compose functions internally. 0092 0093 if ~exist('m', 'var') || isempty(m) 0094 m = 1; 0095 end 0096 0097 if ~exist('transposed', 'var') || isempty(transposed) 0098 transposed = false; 0099 end 0100 0101 if transposed 0102 trnsp = @(X) X'; 0103 trnspstr = ', transposed'; 0104 else 0105 trnsp = @(X) X; 0106 trnspstr = ''; 0107 end 0108 0109 M.name = @() sprintf('Hyperbolic manifold H(%d, %d)%s', n, m, trnspstr); 0110 0111 M.dim = @() n*m; 0112 0113 M.typicaldist = @() sqrt(n*m); 0114 0115 % Returns a row vector q such that q(k) is the Minkowski inner product 0116 % of columns U(:, k) and V(:, k). This is defined in all of Minkowski 0117 % space, not only on tangent spaces. In particular, if X is a point on 0118 % the manifold, then inner_minkowski_columns(X, X) should return a 0119 % vector of all -1's. 0120 function q = inner_minkowski_columns(U, V) 0121 q = -U(1, :).*V(1, :) + sum(U(2:end, :).*V(2:end, :), 1); 0122 end 0123 0124 % Riemannian metric: we sum over the m copies of the hyperbolic 0125 % manifold, each equipped with a restriction of the Minkowski metric. 0126 M.inner = @(X, U, V) sum(inner_minkowski_columns(trnsp(U), trnsp(V))); 0127 0128 % Mathematically, the Riemannian metric is positive definite, hence 0129 % M.inner always returns a nonnegative number when U is tangent at X. 0130 % Numerically, because the inner product involves a difference of 0131 % positive numbers, round-off may result in a small negative number. 0132 % Taking the max against 0 avoids imaginary results. 0133 M.norm = @(X, U) sqrt(max(M.inner(X, U, U), 0)); 0134 0135 M.dist = @(X, Y) norm(dists(trnsp(X), trnsp(Y))); 0136 % This function returns a row vector of length m such that d(k) is the 0137 % geodesic distance between X(:, k) and Y(:, k). 0138 function d = dists(X, Y) 0139 % Mathematically, each column of U = X-Y has nonnegative squared 0140 % Minkowski norm. To avoid potentially imaginary results due to 0141 % round-off errors, we take the max against 0. 0142 U = X-Y; 0143 mink_sqnorms = max(0, inner_minkowski_columns(U, U)); 0144 mink_norms = sqrt(mink_sqnorms); 0145 d = 2*asinh(.5*mink_norms); 0146 % The formula above is equivalent to 0147 % d = max(0, real(acosh(-inner_minkowski_columns(X, Y)))); 0148 % but is numerically more accurate when distances are small. 0149 % When distances are large, it is better to use the acosh formula. 0150 end 0151 0152 M.proj = @(X, U) trnsp(projection(trnsp(X), trnsp(U))); 0153 function PU = projection(X, U) 0154 inners = inner_minkowski_columns(X, U); 0155 PU = U + bsxfun(@times, X, inners); 0156 end 0157 0158 M.tangent = M.proj; 0159 0160 % For Riemannian submanifolds, converting the Euclidean gradient into 0161 % the Riemannian gradient amounts to an orthogonal projection. Here 0162 % however, the manifold is not a Riemannian submanifold of Euclidean 0163 % space, hence extra corrections are required to account for the change 0164 % of metric. 0165 M.egrad2rgrad = @(X, egrad) trnsp(egrad2rgrad(trnsp(X), trnsp(egrad))); 0166 function rgrad = egrad2rgrad(X, egrad) 0167 egrad(1, :) = -egrad(1, :); 0168 rgrad = projection(X, egrad); 0169 end 0170 0171 M.ehess2rhess = @(X, egrad, ehess, U) ... 0172 trnsp(ehess2rhess(trnsp(X), trnsp(egrad), trnsp(ehess), trnsp(U))); 0173 function rhess = ehess2rhess(X, egrad, ehess, U) 0174 egrad(1, :) = -egrad(1, :); 0175 ehess(1, :) = -ehess(1, :); 0176 inners = inner_minkowski_columns(X, egrad); 0177 rhess = projection(X, bsxfun(@times, U, inners) + ehess); 0178 end 0179 0180 % For the exponential, we cannot separate trnsp() nicely from the main 0181 % function because the third input, t, is optional. 0182 M.exp = @exponential; 0183 function Y = exponential(X, U, t) 0184 X = trnsp(X); 0185 U = trnsp(U); 0186 0187 if nargin < 3 0188 tU = U; % corresponds to t = 1 0189 else 0190 tU = t*U; 0191 end 0192 0193 % Compute the individual Minkowski norms of the columns of U. 0194 mink_inners = inner_minkowski_columns(tU, tU); 0195 mink_norms = sqrt(max(0, mink_inners)); 0196 0197 % Coefficients for the exponential. For b, note that NaN's appear 0198 % when an element of mink_norms is zero, in which case the correct 0199 % convention is to define sinh(0)/0 = 1. 0200 a = cosh(mink_norms); 0201 b = sinh(mink_norms)./mink_norms; 0202 b(isnan(b)) = 1; 0203 0204 Y = bsxfun(@times, X, a) + bsxfun(@times, tU, b); 0205 0206 Y = trnsp(Y); 0207 end 0208 0209 M.retr = M.exp; 0210 0211 M.log = @(X, Y) trnsp(logarithm(trnsp(X), trnsp(Y))); 0212 function U = logarithm(X, Y) 0213 d = dists(X, Y); 0214 a = d./sinh(d); 0215 a(isnan(a)) = 1; 0216 U = projection(X, bsxfun(@times, Y, a)); 0217 end 0218 0219 M.hash = @(X) ['z' hashmd5(X(:))]; 0220 0221 M.rand = @() trnsp(myrand()); 0222 function X = myrand() 0223 X1 = randn(n, m); 0224 x0 = sqrt(1 + sum(X1.^2, 1)); % selects positive branch 0225 X = [x0; X1]; 0226 end 0227 0228 M.normalize = @(X, U) U / M.norm(X, U); 0229 M.randvec = @(X) M.normalize(X, M.proj(X, randn(size(X)))); 0230 0231 M.lincomb = @matrixlincomb; 0232 0233 M.zerovec = @(X) zeros(size(X)); 0234 0235 M.transp = @(X1, X2, U) M.proj(X2, U); 0236 0237 M.pairmean = @(x1, x2) M.exp(x1, M.log(x1, x2), .5); 0238 0239 % vec returns a vector representation of an input tangent vector which 0240 % is represented as a matrix; mat returns the original matrix 0241 % representation of the input vector representation of a tangent 0242 % vector; vec and mat are thus inverse of each other. 0243 vect = @(X) X(:); 0244 M.vec = @(X, U_mat) vect(trnsp(U_mat)); 0245 M.mat = @(X, U_vec) trnsp(reshape(U_vec, [n+1, m])); 0246 M.vecmatareisometries = @() false; 0247 0248 end