Estimating Sequence Similarity from Read Sets for Clustering Next-Generation Sequencing data

May 16, 2017 Β· Declared Dead Β· πŸ› Data mining and knowledge discovery

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Petr RyΕ‘avΓ½, Filip Ε½eleznΓ½ arXiv ID 1705.06125 Category cs.DS: Data Structures & Algorithms Cross-listed q-bio.GN Citations 7 Venue Data mining and knowledge discovery Last Checked 4 months ago
Abstract
To cluster sequences given only their read-set representations, one may try to reconstruct each one from the corresponding read set, and then employ conventional (dis)similarity measures such as the edit distance on the assembled sequences. This approach is however problematic and we propose instead to estimate the similarities directly from the read sets. Our approach is based on an adaptation of the Monge-Elkan similarity known from the field of databases. It avoids the NP-hard problem of sequence assembly. For low coverage data it results in a better approximation of the true sequence similarities and consequently in better clustering, in comparison to the first-assemble-then-cluster approach.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Data Structures & Algorithms

Died the same way β€” πŸ‘» Ghosted