Defining Reference Sequences for Nocardia Species by Similarity and Clustering Analyses of 16S rRNA Gene Sequence Data
November 29, 2023 Β· Declared Dead Β· π PLoS ONE
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Manal Helal, Fanrong Kong, Sharon C. A. Chen, Michael Bain, Richard Christen, Vitali Sintchenko
arXiv ID
2311.17965
Category
q-bio.GN
Cross-listed
cs.LG
Citations
13
Venue
PLoS ONE
Last Checked
3 months ago
Abstract
The intra- and inter-species genetic diversity of bacteria and the absence of 'reference', or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia. A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear mapping (LM) of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization. Results: The LM algorithm achieved the highest performance and classified the set of 364 16S rRNA sequences into 80 clusters, the majority of which (83.52%) corresponded with the original species. The most representative 16S rRNA sequences for individual Nocardia species have been identified as 'centroids' in respective clusters from which the distances to all other sequences were minimized; 110 16S rRNA gene sequences with identifications recorded only at the genus level were classified using machine learning methods. Simple kNN machine learning demonstrated the highest performance and classified Nocardia species sequences with an accuracy of 92.7% and a mean frequency of 0.578.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β q-bio.GN
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Accurate Genomic Prediction Of Human Height
R.I.P.
π»
Ghosted
Synergistic Drug Combination Prediction by Integrating Multi-omics Data in Deep Learning Models
π
π
Old Age
GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping
R.I.P.
π»
Ghosted
Tasks, Techniques, and Tools for Genomic Data Visualization
π
π
Old Age
Spaced seeds improve k-mer-based metagenomic classification
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted