Substring Complexity in Sublinear Space
July 16, 2020 Β· Declared Dead Β· π International Symposium on Algorithms and Computation
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Giulia Bernardini, Gabriele Fici, PaweΕ Gawrychowski, Solon P. Pissis
arXiv ID
2007.08357
Category
cs.DS: Data Structures & Algorithms
Citations
7
Venue
International Symposium on Algorithms and Computation
Last Checked
4 months ago
Abstract
Shannon's entropy is a definitive lower bound for statistical compression. Unfortunately, no such clear measure exists for the compressibility of repetitive strings. Thus, ad hoc measures are employed to estimate the repetitiveness of strings, e.g., the size $z$ of the Lempel-Ziv parse or the number $r$ of equal-letter runs of the Burrows-Wheeler transform. A more recent one is the size $Ξ³$ of a smallest string attractor. Let $T$ be a string of length $n$. A string attractor of $T$ is a set of positions of $T$ capturing the occurrences of all the substrings of $T$. Unfortunately, Kempa and Prezza [STOC 2018] showed that computing $Ξ³$ is NP-hard. Kociumaka et al. [LATIN 2020] considered a new measure of compressibility that is based on the function $S_T(k)$ counting the number of distinct substrings of length $k$ of $T$, also known as the substring complexity of $T$. This new measure is defined as $Ξ΄= \sup\{S_T(k)/k, k\geq 1\}$ and lower bounds all the relevant ad hoc measures previously considered. In particular, $Ξ΄\leq Ξ³$ always holds and $Ξ΄$ can be computed in $\mathcal{O}(n)$ time using $Ξ(n)$ working space. Kociumaka et al. showed that one can construct an $\mathcal{O}(Ξ΄\log \frac{n}Ξ΄)$-sized representation of $T$ supporting efficient direct access and efficient pattern matching queries on $T$. Given that for highly compressible strings, $Ξ΄$ is significantly smaller than $n$, it is natural to pose the following question: Can we compute $Ξ΄$ efficiently using sublinear working space? We address this algorithmic challenge by showing the following bounds to compute $Ξ΄$: $\mathcal{O}(\frac{n^3\log b}{b^2})$ time using $\mathcal{O}(b)$ space, for any $b\in[1,n]$, in the comparison model; or $\tilde{\mathcal{O}}(n^2/b)$ time using $\tilde{\mathcal{O}}(b)$ space, for any $b\in[\sqrt{n},n]$, in the word RAM model.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Data Structures & Algorithms
π
π
The Cartographer
R.I.P.
π»
Ghosted
Route Planning in Transportation Networks
R.I.P.
π»
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
π»
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
π»
Ghosted
Graph Isomorphism in Quasipolynomial Time
π
π
The Cartographer
Simulation optimization: A review of algorithms and applications
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted