Comparing Machine Learning Algorithms with or without Feature Extraction for DNA Classification

November 01, 2020 · Declared Dead · 🏛 arXiv.org

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Xiangxie Zhang, Ben Beinke, Berlian Al Kindhi, Marco Wiering arXiv ID 2011.00485 Category q-bio.OT Cross-listed cs.AI, cs.LG Citations 12 Venue arXiv.org Last Checked 3 months ago

Abstract

The classification of DNA sequences is a key research area in bioinformatics as it enables researchers to conduct genomic analysis and detect possible diseases. In this paper, three state-of-the-art algorithms, namely Convolutional Neural Networks, Deep Neural Networks, and N-gram Probabilistic Models, are used for the task of DNA classification. Furthermore, we introduce a novel feature extraction method based on the Levenshtein distance and randomly generated DNA sub-sequences to compute information-rich features from the DNA sequences. We also use an existing feature extraction method based on 3-grams to represent amino acids and combine both feature extraction methods with a multitude of machine learning algorithms. Four different data sets, each concerning viral diseases such as Covid-19, AIDS, Influenza, and Hepatitis C, are used for evaluating the different approaches. The results of the experiments show that all methods obtain high accuracies on the different DNA datasets. Furthermore, the domain-specific 3-gram feature extraction method leads in general to the best results in the experiments, while the newly proposed technique outperforms all other methods on the smallest Covid-19 dataset

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — q-bio.OT

R.I.P. 👻 Ghosted

An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems

Hector Zenil, Narsis A. Kiani, ... (+6 more)

q-bio.OT 🏛 bioRxiv 📚 95 cites 8 years ago

R.I.P. 👻 Ghosted

A Probabilistic Framework for Quantifying Biological Complexity

Stuart M. Marshall, Alastair R. G. Murray, Leroy Cronin

q-bio.OT 🏛 Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 📚 85 cites 9 years ago

R.I.P. 👻 Ghosted

An Introduction to Programming for Bioscientists: A Python-based Primer

Berk Ekmekci, Charles E. McAnany, Cameron Mura

q-bio.OT 🏛 PLoS Comput. Biol. 📚 69 cites 10 years ago

R.I.P. 👻 Ghosted

Ultrametrics in the genetic code and the genome

Branko Dragovich, Andrei Yu. Khrennikov, Nataša Ž. Mišić

q-bio.OT 🏛 Applied Mathematics and Computation 📚 26 cites 9 years ago

R.I.P. 👻 Ghosted

Nutritionally recommended food for semi- to strict vegetarian diets based on large-scale nutrient composition data

Seunghyeon Kim, Michael F. Fenech, Pan-Jun Kim

q-bio.OT 🏛 Sci. Rep. 📚 22 cites 8 years ago

R.I.P. 👻 Ghosted

Information Theory and the Length Distribution of all Discrete Systems

Les Hatton, Gregory Warr

q-bio.OT 🏛 arXiv 📚 9 cites 8 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 8 years ago