Transforming Role Classification in Scientific Teams Using LLMs and Advanced Predictive Analytics
January 13, 2025 Β· Declared Dead Β· π Quantitative Science Studies
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Wonduk Seo, Yi Bu
arXiv ID
2501.07267
Category
cs.DL: Digital Libraries
Cross-listed
cs.SI
Citations
1
Venue
Quantitative Science Studies
Last Checked
3 months ago
Abstract
Scientific team dynamics are critical in determining the nature and impact of research outputs. However, existing methods for classifying author roles based on self-reports and clustering lack comprehensive contextual analysis of contributions. Thus, we present a transformative approach to classifying author roles in scientific teams using advanced large language models (LLMs), which offers a more refined analysis compared to traditional clustering methods. Specifically, we seek to complement and enhance these traditional methods by utilizing open source and proprietary LLMs, such as GPT-4, Llama3 70B, Llama2 70B, and Mistral 7x8B, for role classification. Utilizing few-shot prompting, we categorize author roles and demonstrate that GPT-4 outperforms other models across multiple categories, surpassing traditional approaches such as XGBoost and BERT. Our methodology also includes building a predictive deep learning model using 10 features. By training this model on a dataset derived from the OpenAlex database, which provides detailed metadata on academic publications -- such as author-publication history, author affiliation, research topics, and citation counts -- we achieve an F1 score of 0.76, demonstrating robust classification of author roles.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Digital Libraries
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Measuring academic influence: Not all citations are equal
R.I.P.
π»
Ghosted
The Open Access Advantage Considering Citation, Article Usage and Social Media Attention
R.I.P.
π»
Ghosted
A Bibliometric Review of Large Language Models Research from 2017 to 2023
R.I.P.
π»
Ghosted
On the Performance of Hybrid Search Strategies for Systematic Literature Reviews in Software Engineering
R.I.P.
π»
Ghosted
A Systematic Identification and Analysis of Scientists on Twitter
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted