Learning the rules of peptide self-assembly through data mining with large language models
November 08, 2024 Β· Declared Dead Β· π Science Advances
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Zhenze Yang, Sarah K. Yorke, Tuomas P. J. Knowles, Markus J. Buehler
arXiv ID
2411.05421
Category
cond-mat.soft
Cross-listed
cond-mat.dis-nn,
cond-mat.mes-hall,
cs.AI,
cs.CL
Citations
9
Venue
Science Advances
Last Checked
3 months ago
Abstract
Peptides are ubiquitous and important biologically derived molecules, that have been found to self-assemble to form a wide array of structures. Extensive research has explored the impacts of both internal chemical composition and external environmental stimuli on the self-assembly behaviour of these systems. However, there is yet to be a systematic study that gathers this rich literature data and collectively examines these experimental factors to provide a global picture of the fundamental rules that govern protein self-assembly behavior. In this work, we curate a peptide assembly database through a combination of manual processing by human experts and literature mining facilitated by a large language model. As a result, we collect more than 1,000 experimental data entries with information about peptide sequence, experimental conditions and corresponding self-assembly phases. Utilizing the collected data, ML models are trained and evaluated, demonstrating excellent accuracy (>80\%) and efficiency in peptide assembly phase classification. Moreover, we fine-tune our GPT model for peptide literature mining with the developed dataset, which exhibits markedly superior performance in extracting information from academic publications relative to the pre-trained model. We find that this workflow can substantially improve efficiency when exploring potential self-assembling peptide candidates, through guiding experimental work, while also deepening our understanding of the mechanisms governing peptide self-assembly. In doing so, novel structures can be accessed for a range of applications including sensing, catalysis and biomaterials.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β cond-mat.soft
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Programming Soft Robots with Flexible Mechanical Metamaterials
R.I.P.
π»
Ghosted
Polymers for Extreme Conditions Designed Using Syntax-Directed Variational Autoencoders
R.I.P.
π»
Ghosted
Machine learning enables polymer cloud-point engineering via inverse design
R.I.P.
π»
Ghosted
Programming Active Cohesive Granular Matter with Mechanically Induced Phase Changes
R.I.P.
π»
Ghosted
Understanding Legged Crawling for Soft-Robotics
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted