Probabilistic Threshold Indexing for Uncertain Strings
September 29, 2015 Β· Declared Dead Β· π International Conference on Extending Database Technology
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Sharma V. Thankachan, Manish Patil, Rahul Shah, Sudip Biswas
arXiv ID
1509.08608
Category
cs.DB: Databases
Cross-listed
cs.DS
Citations
15
Venue
International Conference on Extending Database Technology
Last Checked
4 months ago
Abstract
Strings form a fundamental data type in computer systems. String searching has been extensively studied since the inception of computer science. Increasingly many applications have to deal with imprecise strings or strings with fuzzy information in them. String matching becomes a probabilistic event when a string contains uncertainty, i.e. each position of the string can have different probable characters with associated probability of occurrence for each character. Such uncertain strings are prevalent in various applications such as biological sequence data, event monitoring and automatic ECG annotations. We explore the problem of indexing uncertain strings to support efficient string searching. In this paper we consider two basic problems of string searching, namely substring searching and string listing. In substring searching, the task is to find the occurrences of a deterministic string in an uncertain string. We formulate the string listing problem for uncertain strings, where the objective is to output all the strings from a collection of strings, that contain probable occurrence of a deterministic query string. Indexing solution for both these problems are significantly more challenging for uncertain strings than for deterministic strings. Given a construction time probability value $Ο$, our indexes can be constructed in linear space and supports queries in near optimal time for arbitrary values of probability threshold parameter greater than $Ο$. To the best of our knowledge, this is the first indexing solution for searching in uncertain strings that achieves strong theoretical bound and supports arbitrary values of probability threshold parameter. We also propose an approximate substring search index that can answer substring search queries with an additive error in optimal time. We conduct experiments to evaluate the performance of our indexes.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Databases
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Untangling Blockchain: A Data Processing View of Blockchain Systems
R.I.P.
π»
Ghosted
Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades
R.I.P.
π»
Ghosted
BLOCKBENCH: A Framework for Analyzing Private Blockchains
R.I.P.
π»
Ghosted
Data Synthesis based on Generative Adversarial Networks
R.I.P.
π»
Ghosted
HoloClean: Holistic Data Repairs with Probabilistic Inference
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted