Comparing Machine Learning Algorithms with or without Feature Extraction for DNA Classification
November 01, 2020 Β· Declared Dead Β· π arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Xiangxie Zhang, Ben Beinke, Berlian Al Kindhi, Marco Wiering
arXiv ID
2011.00485
Category
q-bio.OT
Cross-listed
cs.AI,
cs.LG
Citations
12
Venue
arXiv.org
Last Checked
3 months ago
Abstract
The classification of DNA sequences is a key research area in bioinformatics as it enables researchers to conduct genomic analysis and detect possible diseases. In this paper, three state-of-the-art algorithms, namely Convolutional Neural Networks, Deep Neural Networks, and N-gram Probabilistic Models, are used for the task of DNA classification. Furthermore, we introduce a novel feature extraction method based on the Levenshtein distance and randomly generated DNA sub-sequences to compute information-rich features from the DNA sequences. We also use an existing feature extraction method based on 3-grams to represent amino acids and combine both feature extraction methods with a multitude of machine learning algorithms. Four different data sets, each concerning viral diseases such as Covid-19, AIDS, Influenza, and Hepatitis C, are used for evaluating the different approaches. The results of the experiments show that all methods obtain high accuracies on the different DNA datasets. Furthermore, the domain-specific 3-gram feature extraction method leads in general to the best results in the experiments, while the newly proposed technique outperforms all other methods on the smallest Covid-19 dataset
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β q-bio.OT
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
A Probabilistic Framework for Quantifying Biological Complexity
R.I.P.
π»
Ghosted
An Introduction to Programming for Bioscientists: A Python-based Primer
R.I.P.
π»
Ghosted
Ultrametrics in the genetic code and the genome
R.I.P.
π»
Ghosted
Nutritionally recommended food for semi- to strict vegetarian diets based on large-scale nutrient composition data
R.I.P.
π»
Ghosted
Information Theory and the Length Distribution of all Discrete Systems
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted