Reconstructing Biological Pathways by Applying Selective Incremental Learning to (Very) Small Language Models
July 06, 2025 Β· Declared Dead Β· π arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Pranta Saha, Joyce Reimer, Brook Byrns, Connor Burbridge, Neeraj Dhar, Jeffrey Chen, Steven Rayan, Gordon Broderick
arXiv ID
2507.04432
Category
q-bio.MN
Cross-listed
cs.CL,
cs.IT,
cs.LG,
cs.PF
Citations
1
Venue
arXiv.org
Last Checked
3 months ago
Abstract
The use of generative artificial intelligence (AI) models is becoming ubiquitous in many fields. Though progress continues to be made, general purpose large language AI models (LLM) show a tendency to deliver creative answers, often called "hallucinations", which have slowed their application in the medical and biomedical fields where accuracy is paramount. We propose that the design and use of much smaller, domain and even task-specific LM may be a more rational and appropriate use of this technology in biomedical research. In this work we apply a very small LM by today's standards to the specialized task of predicting regulatory interactions between molecular components to fill gaps in our current understanding of intracellular pathways. Toward this we attempt to correctly posit known pathway-informed interactions recovered from manually curated pathway databases by selecting and using only the most informative examples as part of an active learning scheme. With this example we show that a small (~110 million parameters) LM based on a Bidirectional Encoder Representations from Transformers (BERT) architecture can propose molecular interactions relevant to tuberculosis persistence and transmission with over 80% accuracy using less than 25% of the ~520 regulatory relationships in question. Using information entropy as a metric for the iterative selection of new tuning examples, we also find that increased accuracy is driven by favoring the use of the incorrectly assigned statements with the highest certainty (lowest entropy). In contrast, the concurrent use of correct but least certain examples contributed little and may have even been detrimental to the learning rate.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β q-bio.MN
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Large-scale analysis of disease pathways in the human interactome
R.I.P.
π»
Ghosted
Diffusion Component Analysis: Unraveling Functional Topology in Biological Networks
R.I.P.
π»
Ghosted
AptRank: An Adaptive PageRank Model for Protein Function Prediction on Bi-relational Graphs
R.I.P.
π»
Ghosted
Learning of signaling networks: molecular mechanisms
R.I.P.
π»
Ghosted
Control of Gene Regulatory Networks with Noisy Measurements and Uncertain Inputs
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted