Real-world data typically contain a large number of features that are often heterogeneous in nature, relevance, and also units of measure. When assessing the similarity between data points, one can build various distance measures using subsets of these features. Finding a small set of features that still retains sufficient information about the dataset is important for the successful application of many statistical learning approaches.
We introduce an approach that can assess the relative information retained when using two different distance measures, and determine if they are equivalent, independent, or if one is more informative than the other. This test can be used to identify the most informative distance measure out of a pool of candidates, to compare the representations in deep neural networks, and to infer causality in high-dimensional dynamic processes and time series.
This colloquium is organized around data sciences in a broad sense, with the goal of bringing together researchers with diverse backgrounds (including mathematics, computer science, physics, chemistry and neuroscience) but a common interest in dealing with complex, large scale, or high dimensional data.
These seminars are being made possible through the support of the CFM-ENS Chair « Modèles et Sciences des Données ».
The organizers: Giulio Biroli, Alex Cayco Gajic, Bruno Loureiro, Stéphane Mallat, Gabriel Peyré.