A Library Perspective on Supervised Text Processing in Digital Libraries: An Investigation in the Biomedical Domain
November 06, 2024 Β· Declared Dead Β· π ACM/IEEE Joint Conference on Digital Libraries
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Hermann Kroll, Pascal Sackhoff, Bill Matthias Thang, Maha Ksouri, Wolf-Tilo Balke
arXiv ID
2411.12752
Category
cs.DL: Digital Libraries
Cross-listed
cs.CL
Citations
0
Venue
ACM/IEEE Joint Conference on Digital Libraries
Last Checked
3 months ago
Abstract
Digital libraries that maintain extensive textual collections may want to further enrich their content for certain downstream applications, e.g., building knowledge graphs, semantic enrichment of documents, or implementing novel access paths. All of these applications require some text processing, either to identify relevant entities, extract semantic relationships between them, or to classify documents into some categories. However, implementing reliable, supervised workflows can become quite challenging for a digital library because suitable training data must be crafted, and reliable models must be trained. While many works focus on achieving the highest accuracy on some benchmarks, we tackle the problem from a digital library practitioner. In other words, we also consider trade-offs between accuracy and application costs, dive into training data generation through distant supervision and large language models such as ChatGPT, LLama, and Olmo, and discuss how to design final pipelines. Therefore, we focus on relation extraction and text classification, using the showcase of eight biomedical benchmarks.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Digital Libraries
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Measuring academic influence: Not all citations are equal
R.I.P.
π»
Ghosted
The Open Access Advantage Considering Citation, Article Usage and Social Media Attention
R.I.P.
π»
Ghosted
A Bibliometric Review of Large Language Models Research from 2017 to 2023
R.I.P.
π»
Ghosted
On the Performance of Hybrid Search Strategies for Systematic Literature Reviews in Software Engineering
R.I.P.
π»
Ghosted
A Systematic Identification and Analysis of Scientists on Twitter
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted