๐ฎ
๐ฎ
The Ethereal
On the compressiveness of the Burrows-Wheeler transform
November 18, 2024 ยท The Ethereal ยท ๐ Annual Symposium on Combinatorial Pattern Matching
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Hideo Bannai, Tomohiro I, Yuto Nakashima
arXiv ID
2411.11298
Category
cs.DM: Discrete Mathematics
Cross-listed
cs.DS
Citations
2
Venue
Annual Symposium on Combinatorial Pattern Matching
Last Checked
2 months ago
Abstract
The Burrows-Wheeler transform (BWT) is a reversible transform that converts a string $w$ into another string $\mathsf{BWT}(w)$. The size of the run-length encoded BWT (RLBWT) can be interpreted as a measure of repetitiveness in the class of representations called dictionary compression which are essentially representations based on copy and paste operations. In this paper, we shed new light on the compressiveness of BWT and the bijective BWT (BBWT). We first extend previous results on the relations of their run-length compressed sizes $r$ and $r_B$. We also show that the so-called ``clustering effect'' of BWT and BBWT can be captured by measures other than empirical entropy or run-length encoding. In particular, we show that BWT and BBWT do not increase the repetitiveness of the string with respect to various measures based on dictionary compression by more than a polylogarithmic factor. Furthermore, we show that there exists an infinite family of strings that are maximally incompressible by any dictionary compression measure, but become very compressible after applying BBWT. An interesting implication of this result is that it is possible to transcend dictionary compression in some cases by simply applying BBWT before applying dictionary compression.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Discrete Mathematics
๐ฎ
๐ฎ
The Ethereal
An Introduction to Temporal Graphs: An Algorithmic Perspective
๐ฎ
๐ฎ
The Ethereal
Guarantees for Greedy Maximization of Non-submodular Functions with Applications
๐ฎ
๐ฎ
The Ethereal
A note on the triangle inequality for the Jaccard distance
๐ฎ
๐ฎ
The Ethereal
Fast clique minor generation in Chimera qubit connectivity graphs
๐ฎ
๐ฎ
The Ethereal