Transforming Role Classification in Scientific Teams Using LLMs and Advanced Predictive Analytics

January 13, 2025 · Declared Dead · 🏛 Quantitative Science Studies

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Wonduk Seo, Yi Bu arXiv ID 2501.07267 Category cs.DL: Digital Libraries Cross-listed cs.SI Citations 1 Venue Quantitative Science Studies Last Checked 3 months ago

Abstract

Scientific team dynamics are critical in determining the nature and impact of research outputs. However, existing methods for classifying author roles based on self-reports and clustering lack comprehensive contextual analysis of contributions. Thus, we present a transformative approach to classifying author roles in scientific teams using advanced large language models (LLMs), which offers a more refined analysis compared to traditional clustering methods. Specifically, we seek to complement and enhance these traditional methods by utilizing open source and proprietary LLMs, such as GPT-4, Llama3 70B, Llama2 70B, and Mistral 7x8B, for role classification. Utilizing few-shot prompting, we categorize author roles and demonstrate that GPT-4 outperforms other models across multiple categories, surpassing traditional approaches such as XGBoost and BERT. Our methodology also includes building a predictive deep learning model using 10 features. By training this model on a dataset derived from the OpenAlex database, which provides detailed metadata on academic publications -- such as author-publication history, author affiliation, research topics, and citation counts -- we achieve an F1 score of 0.76, demonstrating robust classification of author roles.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Digital Libraries

R.I.P. 👻 Ghosted

Constructing bibliometric networks: A comparison between full and fractional counting

Antonio Perianes-Rodriguez, Ludo Waltman, Nees Jan van Eck

cs.DL 🏛 J. Informetrics 📚 1.1K cites 9 years ago

R.I.P. 👻 Ghosted

Measuring academic influence: Not all citations are equal

Xiaodan Zhu, Peter Turney, ... (+2 more)

cs.DL 🏛 J. Assoc. Inf. Sci. Technol. 📚 262 cites 11 years ago

R.I.P. 👻 Ghosted

The Open Access Advantage Considering Citation, Article Usage and Social Media Attention

Xianwen Wang, Chen Liu, ... (+2 more)

cs.DL 🏛 Scientometrics 📚 224 cites 11 years ago

R.I.P. 👻 Ghosted

A Bibliometric Review of Large Language Models Research from 2017 to 2023

Lizhou Fan, Lingyao Li, ... (+4 more)

cs.DL 🏛 ACM TIST 📚 208 cites 3 years ago

R.I.P. 👻 Ghosted

On the Performance of Hybrid Search Strategies for Systematic Literature Reviews in Software Engineering

Erica Mourão, João Felipe Pimentel, ... (+4 more)

cs.DL 🏛 IST 📚 157 cites 6 years ago

R.I.P. 👻 Ghosted

A Systematic Identification and Analysis of Scientists on Twitter

Qing Ke, Yong-Yeol Ahn, Cassidy R. Sugimoto

cs.DL 🏛 PLoS ONE 📚 147 cites 9 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 8 years ago