Better and Simpler Error Analysis of the Sinkhorn-Knopp Algorithm for Matrix Scaling
January 09, 2018 Β· Declared Dead Β· π Mathematical programming
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Deeparnab Chakrabarty, Sanjeev Khanna
arXiv ID
1801.02790
Category
cs.DS: Data Structures & Algorithms
Citations
53
Venue
Mathematical programming
Last Checked
3 months ago
Abstract
Given a non-negative $n \times m$ real matrix $A$, the {\em matrix scaling} problem is to determine if it is possible to scale the rows and columns so that each row and each column sums to a specified target value for it. This problem arises in many algorithmic applications, perhaps most notably as a preconditioning step in solving a linear system of equations. One of the most natural and by now classical approach to matrix scaling is the Sinkhorn-Knopp algorithm (also known as the RAS method) where one alternately scales either all rows or all columns to meet the target values. In addition to being extremely simple and natural, another appeal of this procedure is that it easily lends itself to parallelization. A central question is to understand the rate of convergence of the Sinkhorn-Knopp algorithm. In this paper, we present an elementary convergence analysis for the Sinkhorn-Knopp algorithm that improves upon the previous best bound. In a nutshell, our approach is to show a simple bound on the number of iterations needed so that the KL-divergence between the current row-sums and the target row-sums drops below a specified threshold $Ξ΄$, and then connect the KL-divergence with $\ell_1$ and $\ell_2$ distances. For $\ell_1$, we can use Pinsker's inequality. For $\ell_2$, we develop a strengthening of Pinsker's inequality, called (KL vs $\ell_1/\ell_2$) in the paper, which lower bounds the KL-divergence by a combination of $\ell_1$ and $\ell_2$ distance. This inequality may be of independent interest. The idea of studying Sinkhorn-Knopp convergence via KL-divergence is not new and has indeed been previously explored. Our contribution is an elementary, self-contained presentation of this approach and an interesting new inequality that yields a significantly stronger convergence guarantee for the extensively studied $\ell_2$-error.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Data Structures & Algorithms
π
π
The Cartographer
R.I.P.
π»
Ghosted
Route Planning in Transportation Networks
R.I.P.
π»
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
π»
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
π»
Ghosted
Graph Isomorphism in Quasipolynomial Time
π
π
The Cartographer
Simulation optimization: A review of algorithms and applications
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted