Gromov Hyperbolicity measures the “tree-likeness” of a dataset. This metric is an indicator of how well hierarchical embeddings such as Poincaré embeddings [1] would work on a dataset. Some papers which use this metric are [2] and [3]. A Gromov Hyperbolicity of approximately zero means a high tree-likeness.

There are different ways to define Gromov hyperbolicity. I will present here two variants: sampling \(\delta_{avg}\) and exact \(\delta_{worst}\). Most papers seem to use sampling, because it is faster and produces better results.

where \(S_i = \{d(x, y) + d(u, v), d(x, u) + d(y, v), d(x, v) + d(y, u)\}\) and \(S_i\) is the \(i\)th largest element.

Exact Gromov hyperbolicity

SageMath includes already a fast algorithm for computing exact Gromov hyperbolicity. The input is an adjacency matrix. It is a good idea to use a sparse matrix if the dataset is too big.

Sample Gromov hyperbolicity

It is not hard to implement the formula from above. This time we will use the library NetworkX. One can also change the distance function \(d(u, v)\) if the dataset is already embedded in a space (see [3]).

References

[1] M. Nickel and D. Kiela, Poincaré Embeddings for Learning Hierarchical Representations, 2017.

[2] I. Chami, Z. Ying, C. Ré and J. Leskovec, Hyperbolic Graph Convolutional Neural Networks, 2019.

[3] A. Tifrea, G. Bécigneul and O.-E. Ganea, Poincaré GloVe: Hyperbolic Word Embeddings, 2018.