R.I.P.
๐ป
Ghosted
ESM-NBR: fast and accurate nucleic acid-binding residue prediction via protein language model feature representation and multi-task learning
December 01, 2023 ยท Entered Twilight ยท ๐ IEEE International Conference on Bioinformatics and Biomedicine
Repo contents: ESM-NBR-standalone.zip, README.md, dataset, suppl-ESM-NBR(12.1).pdf
Authors
Wenwu Zeng, Dafeng Lv, Wenjuan Liu, Shaoliang Peng
arXiv ID
2312.00842
Category
q-bio.QM
Cross-listed
cs.LG
Citations
8
Venue
IEEE International Conference on Bioinformatics and Biomedicine
Repository
https://github.com/wwzll123/ESM-NBR
Last Checked
2 months ago
Abstract
Protein-nucleic acid interactions play a very important role in a variety of biological activities. Accurate identification of nucleic acid-binding residues is a critical step in understanding the interaction mechanisms. Although many computationally based methods have been developed to predict nucleic acid-binding residues, challenges remain. In this study, a fast and accurate sequence-based method, called ESM-NBR, is proposed. In ESM-NBR, we first use the large protein language model ESM2 to extract discriminative biological properties feature representation from protein primary sequences; then, a multi-task deep learning model composed of stacked bidirectional long short-term memory (BiLSTM) and multi-layer perceptron (MLP) networks is employed to explore common and private information of DNA- and RNA-binding residues with ESM2 feature as input. Experimental results on benchmark data sets demonstrate that the prediction performance of ESM2 feature representation comprehensively outperforms evolutionary information-based hidden Markov model (HMM) features. Meanwhile, the ESM-NBR obtains the MCC values for DNA-binding residues prediction of 0.427 and 0.391 on two independent test sets, which are 18.61 and 10.45% higher than those of the second-best methods, respectively. Moreover, by completely discarding the time-cost multiple sequence alignment process, the prediction speed of ESM-NBR far exceeds that of existing methods (5.52s for a protein sequence of length 500, which is about 16 times faster than the second-fastest method). A user-friendly standalone package and the data of ESM-NBR are freely available for academic use at: https://github.com/wwzll123/ESM-NBR.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ q-bio.QM
R.I.P.
๐ป
Ghosted
GuacaMol: Benchmarking Models for De Novo Molecular Design
R.I.P.
๐ป
Ghosted
DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences
R.I.P.
๐ป
Ghosted
ProtVec: A Continuous Distributed Representation of Biological Sequences
R.I.P.
๐ป
Ghosted
A Perspective on Deep Imaging
R.I.P.
๐
404 Not Found