Topological Information Data Analysis
July 06, 2019 Β· Declared Dead Β· π Entropy
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Pierre Baudot, Monica Tapia, Daniel Bennequin, Jean-Marc Goaillard
arXiv ID
1907.04242
Category
stat.OT
Cross-listed
cs.IT,
q-bio.NC
Citations
52
Venue
Entropy
Last Checked
3 months ago
Abstract
This paper presents methods that quantify the structure of statistical interactions within a given data set, and was first used in \cite{Tapia2018}. It establishes new results on the k-multivariate mutual-informations (I_k) inspired by the topological formulation of Information introduced in. In particular we show that the vanishing of all I_k for 2\leq k \leq n of n random variables is equivalent to their statistical independence. Pursuing the work of Hu Kuo Ting and Te Sun Han, we show that information functions provide co-ordinates for binary variables, and that they are analytically independent on the probability simplex for any set of finite variables. The maximal positive I_k identifies the variables that co-vary the most in the population, whereas the minimal negative I_k identifies synergistic clusters and the variables that differentiate-segregate the most the population. Finite data size effects and estimation biases severely constrain the effective computation of the information topology on data, and we provide simple statistical tests for the undersampling bias and the k-dependences following. We give an example of application of these methods to genetic expression and unsupervised cell-type classification. The methods unravel biologically relevant subtypes, with a sample size of 41 genes and with few errors. It establishes generic basic methods to quantify the epigenetic information storage and a unified epigenetic unsupervised learning formalism. We propose that higher-order statistical interactions and non identically distributed variables are constitutive characteristics of biological systems that should be estimated in order to unravel their significant statistical structure and diversity. The topological information data analysis presented here allows to precisely estimate this higher-order structure characteristic of biological systems.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β stat.OT
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Approaching Ethical Guidelines for Data Scientists
R.I.P.
π»
Ghosted
Aggregating incoherent agents who disagree
R.I.P.
π»
Ghosted
Product risk assessment: a Bayesian network approach
R.I.P.
π»
Ghosted
A Fourier-invariant method for locating point-masses and computing their attributes
R.I.P.
π»
Ghosted
Can everyday AI be ethical. Fairness of Machine Learning Algorithms
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted