Talk by Guillaume Lagarde in the Distributed Algorithms seminar at LaBRI (U. Bordeaux).
Title: On Efficient Low Distortion Ultrametric Embedding
Abstract: A classic problem in unsupervised learning and data analysis is to find simple and easy-to-visualize representations of the data that still preserve its essential properties. Hierarchical clustering is one such representation that is widely used and preserves the underlying hierarchical structure of the data. The task of finding an interesting hierarchical clustering can be formalized as the task of finding an embedding of the data into an ultrametric space. The most popular algorithms are the so-called "agglomerative" ones (single linkage, average linkage, Ward's method, etc.). However these methods exhibit a prohibitive quadratic running time, making them impractical to handle large inputs. In this talk I will present a new algorithm that runs for Euclidean metric in time n^{1+1/c^2} for any constant c >= 1 and outputs a 5c-approximation of the "best" ultrametric. We complement this approach with some lower bounds and prove in particular that, in some settings, there is no 3/2-approximation running in subquadratic time. Finally I will present some empirical evaluation of the algorithm on some classic machine learning datasets. Joint work with Vincent Cohen-Addad and Karthik C. S.
Informations
- Arnaud Casteigts (acasteig@u-bordeaux.fr)
- 8 décembre 2020 13:32
- Conférence
- Anglais