Compressed Multiple Pattern Matching
November 03, 2018 Β· Declared Dead Β· π Annual Symposium on Combinatorial Pattern Matching
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Dmitry Kosolobov, Nikita Sivukhin
arXiv ID
1811.01248
Category
cs.DS: Data Structures & Algorithms
Citations
6
Venue
Annual Symposium on Combinatorial Pattern Matching
Last Checked
4 months ago
Abstract
Given $d$ strings over the alphabet $\{0,1,\ldots,Ο{-}1\}$, the classical Aho--Corasick data structure allows us to find all $occ$ occurrences of the strings in any text $T$ in $O(|T| + occ)$ time using $O(m\log m)$ bits of space, where $m$ is the number of edges in the trie containing the strings. Fix any constant $\varepsilon \in (0, 2)$. We describe a compressed solution for the problem that, provided $Ο\le m^Ξ΄$ for a constant $Ξ΄< 1$, works in $O(|T| \frac{1}{\varepsilon} \log\frac{1}{\varepsilon} + occ)$ time, which is $O(|T| + occ)$ since $\varepsilon$ is constant, and occupies $mH_k + 1.443 m + \varepsilon m + O(d\log\frac{m}{d})$ bits of space, for all $0 \le k \le \max\{0,Ξ±\log_Οm - 2\}$ simultaneously, where $Ξ±\in (0,1)$ is an arbitrary constant and $H_k$ is the $k$th-order empirical entropy of the trie. Hence, we reduce the $3.443m$ term in the space bounds of previously best succinct solutions to $(1.443 + \varepsilon)m$, thus solving an open problem posed by Belazzougui. Further, we notice that $L = \log\binom{Ο(m+1)}{m} - O(\log(Οm))$ is a worst-case space lower bound for any solution of the problem and, for $d = o(m)$ and constant $\varepsilon$, our approach allows to achieve $L + \varepsilon m$ bits of space, which gives an evidence that, for $d = o(m)$, the space of our data structure is theoretically optimal up to the $\varepsilon m$ additive term and it is hardly possible to eliminate the term $1.443m$. In addition, we refine the space analysis of previous works by proposing a more appropriate definition for $H_k$. We also simplify the construction for practice adapting the fixed block compression boosting technique, then implement our data structure, and conduct a number of experiments showing that it is comparable to the state of the art in terms of time and is superior in space.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Data Structures & Algorithms
π
π
The Cartographer
R.I.P.
π»
Ghosted
Route Planning in Transportation Networks
R.I.P.
π»
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
π»
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
π»
Ghosted
Graph Isomorphism in Quasipolynomial Time
π
π
The Cartographer
Simulation optimization: A review of algorithms and applications
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted