conLSH: Context based Locality Sensitive Hashing for Mapping of noisy SMRT Reads

March 11, 2019 · Declared Dead · 🏛 bioRxiv

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Angana Chakraborty, Sanghamitra Bandyopadhyay arXiv ID 1903.04925 Category q-bio.GN Cross-listed cs.DS, cs.LG, stat.ML Citations 8 Venue bioRxiv Last Checked 3 months ago

Abstract

Single Molecule Real-Time (SMRT) sequencing is a recent advancement of Next Gen technology developed by Pacific Bio (PacBio). It comes with an explosion of long and noisy reads demanding cutting edge research to get most out of it. To deal with the high error probability of SMRT data, a novel contextual Locality Sensitive Hashing (conLSH) based algorithm is proposed in this article, which can effectively align the noisy SMRT reads to the reference genome. Here, sequences are hashed together based not only on their closeness, but also on similarity of context. The algorithm has $\mathcal{O}(n^{ρ+1})$ space requirement, where $n$ is the number of sequences in the corpus and $ρ$ is a constant. The indexing time and querying time are bounded by $\mathcal{O}( \frac{n^{ρ+1} \cdot \ln n}{\ln \frac{1}{P_2}})$ and $\mathcal{O}(n^ρ)$ respectively, where $P_2 > 0$, is a probability value. This algorithm is particularly useful for retrieving similar sequences, a widely used task in biology. The proposed conLSH based aligner is compared with rHAT, popularly used for aligning SMRT reads, and is found to comprehensively beat it in speed as well as in memory requirements. In particular, it takes approximately $24.2\%$ less processing time, while saving about $70.3\%$ in peak memory requirement for H.sapiens PacBio dataset.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — q-bio.GN

R.I.P. 👻 Ghosted

DeepGO: Predicting protein functions from sequence and interactions using a deep ontology-aware classifier

Maxat Kulmanov, Mohammed Asif Khan, Robert Hoehndorf

q-bio.GN 🏛 Bioinform. 📚 443 cites 9 years ago

R.I.P. 👻 Ghosted

Accurate Genomic Prediction Of Human Height

Louis Lello, Steven G. Avery, ... (+4 more)

q-bio.GN 🏛 Genetics 📚 154 cites 8 years ago

R.I.P. 👻 Ghosted

Synergistic Drug Combination Prediction by Integrating Multi-omics Data in Deep Learning Models

Tianyu Zhang, Liwei Zhang, ... (+2 more)

q-bio.GN 🏛 Methods in molecular biology 📚 120 cites 7 years ago

🌅 🌅 Old Age

GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping

Mohammed Alser, Hasan Hassan, ... (+4 more)

q-bio.GN 🏛 Bioinform. 📚 116 cites 10 years ago

R.I.P. 👻 Ghosted

Tasks, Techniques, and Tools for Genomic Data Visualization

Sabrina Nusrat, Theresa Harbig, Nils Gehlenborg

q-bio.GN 🏛 CGF 📚 104 cites 7 years ago

🌅 🌅 Old Age

Spaced seeds improve k-mer-based metagenomic classification

Karel Brinda, Maciej Sykulski, Gregory Kucherov

q-bio.GN 🏛 Bioinform. 📚 93 cites 11 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 8 years ago