This page is about our paper

Tree Sampling Divergence: An Information-Theoretic Metric for Hierarchical Graph Clustering

by Bertrand Charpentier, and Thomas Bonald
Published at the International Joint Conference on Artificial Intelligence (IJCAI), 2019

Abstract

We introduce the tree sampling divergence (TSD),an information-theoretic metric for assessing thequality of the hierarchical clustering of a graph.Any hierarchical clustering of a graph can be represented as a tree whose nodes correspond to clusters of the graph. The TSD is the Kullback-Leiblerdivergence between two probability distributionsover the nodes of this tree: those induced respectively by sampling at random edges and node pairsof the graph. A fundamental property of the proposed metric is that it is interpretable in terms ofgraph reconstruction. Specifically, it quantifies theability to reconstruct the graph from the tree interms of information loss. In particular, the TSDis maximum when perfect reconstruction is feasible, i.e., when the graph has a complete hierarchical structure. Another key property of TSD is thatit applies to any tree, not necessarily binary. In particular, the TSD can be used to compress a binarytree while minimizing the information loss in termsof graph reconstruction, so as to get a compact representation of the hierarchical structure of a graph.We illustrate the behavior of TSD compared to existing metrics on experiments based on both synthetic and real datasets.

Links

Paper Github