Estimating Sequence Similarity from Read Sets for Clustering Next-Generation Sequencing data

May 16, 2017 · Declared Dead · 🏛 Data mining and knowledge discovery

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Petr Ryšavý, Filip Železný arXiv ID 1705.06125 Category cs.DS: Data Structures & Algorithms Cross-listed q-bio.GN Citations 7 Venue Data mining and knowledge discovery Last Checked 4 months ago

Abstract

To cluster sequences given only their read-set representations, one may try to reconstruct each one from the corresponding read set, and then employ conventional (dis)similarity measures such as the edit distance on the assembled sequences. This approach is however problematic and we propose instead to estimate the similarities directly from the read sets. Our approach is based on an adaptation of the Monge-Elkan similarity known from the field of databases. It avoids the NP-hard problem of sequence assembly. For low coverage data it results in a better approximation of the true sequence similarities and consequently in better clustering, in comparison to the first-assemble-then-cluster approach.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Data Structures & Algorithms

📚 📚 The Cartographer

Relief-Based Feature Selection: Introduction and Review

Ryan J. Urbanowicz, Melissa Meeker, ... (+3 more)

cs.DS 🏛 J.BI 📚 1.1K cites 8 years ago

R.I.P. 👻 Ghosted

Route Planning in Transportation Networks

Hannah Bast, Daniel Delling, ... (+6 more)

cs.DS 🏛 Algorithm Engineering 📚 759 cites 11 years ago

R.I.P. 👻 Ghosted

Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration

Jason Altschuler, Jonathan Weed, Philippe Rigollet

cs.DS 🏛 NeurIPS 📚 654 cites 9 years ago

R.I.P. 👻 Ghosted

Hierarchical Clustering: Objective Functions and Algorithms

Vincent Cohen-Addad, Varun Kanade, ... (+2 more)

cs.DS 🏛 SODA 📚 637 cites 9 years ago

R.I.P. 👻 Ghosted

Graph Isomorphism in Quasipolynomial Time

László Babai

cs.DS 🏛 STOC 📚 616 cites 10 years ago

📚 📚 The Cartographer

Simulation optimization: A review of algorithms and applications

Satyajith Amaran, Nikolaos V. Sahinidis, ... (+2 more)

cs.DS 🏛 4OR 📚 588 cites 9 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 9 years ago