Machine learning, computer vision, languages
22 Jul 2020
Gromov Hyperbolicity measures the “tree-likeness” of a dataset. This metric is an indicator of how well hierarchical embeddings such as Poincaré embeddings [1] would work on a dataset. Some papers which use this metric are [2] and [3]. A Gromov Hyperbolicity of approximately zero means a high tree-likeness.
Here are the results of some NLP datasets:
WordNet mammals (6540 relations): \(\delta_{avg} \approx 0.00178\) (tree)
WordNet nouns (743086 relations): \(\delta_{avg} \approx 4 \cdot 10^{-4}\) (tree)
OpenThesaurus nouns (321877 relations): \(\delta_{avg} \approx 0.307\) (not a tree)
Hyperlex (2228 relations): \(\delta_{avg} \approx 4 \cdot 10^{-4}\) (tree)
There are different ways to define Gromov hyperbolicity. I will present here two variants: sampling \(\delta_{avg}\) and exact \(\delta_{worst}\). Most papers seem to use sampling, because it is faster and produces better results.
\[\delta_{worst}(G) = \max_{x,y,u,v \in V} \frac{S_1 - S_2}{2}\]where \(S_i = \{d(x, y) + d(u, v), d(x, u) + d(y, v), d(x, v) + d(y, u)\}\) and \(S_i\) is the \(i\)th largest element.
SageMath includes already a fast algorithm for computing exact Gromov hyperbolicity. The input is an adjacency matrix. It is a good idea to use a sparse matrix if the dataset is too big.
It is not hard to implement the formula from above. This time we will use the library NetworkX. One can also change the distance function \(d(u, v)\) if the dataset is already embedded in a space (see [3]).
[1] M. Nickel and D. Kiela, Poincaré Embeddings for Learning Hierarchical Representations, 2017.
[2] I. Chami, Z. Ying, C. Ré and J. Leskovec, Hyperbolic Graph Convolutional Neural Networks, 2019.
[3] A. Tifrea, G. Bécigneul and O.-E. Ganea, Poincaré GloVe: Hyperbolic Word Embeddings, 2018.