Identification of the most important external features of highly cited scholarly papers through 3 (i.e., Ridge, Lasso, and Boruta) feature selection data mining methods
June 23, 2023 Β· Declared Dead Β· π Quality & Quantity: International Journal of Methodology
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Sepideh Fahimifar, Khadijeh Mousavi, Fatemeh Mozaffari, Marcel Ausloos
arXiv ID
2306.13492
Category
physics.soc-ph
Cross-listed
cs.SI
Citations
20
Venue
Quality & Quantity: International Journal of Methodology
Last Checked
3 months ago
Abstract
Highly cited papers are influenced by external factors that are not directly related to the document's intrinsic quality. In this study, 50 characteristics for measuring the performance of 68 highly cited papers, from the Journal of the American Medical Informatics Association indexed in Web of Sciences (WoS), from 2009 to 2019 were investigated. In the first step, a Pearson correlation analysis is performed to eliminate variables with zero or weak correlation with the target (dependent) variable ([number of citations in WOS]). Consequently, 32 variables are selected for the next step. By applying the Ridge technique, 13 features show a positive effect on the number of citations. Using three different algorithms, i.e., Ridge, Lasso, and Boruta, 6 factors appear to be the most relevant ones. The [Number of citations by international researchers], [Journal self-citations in citing documents], and [Authors' self-citations in citing documents], are recognized as the most important features by all three methods here used. The [First author's scientific age], [Open-access paper], and [Number of first author's citations in WOS] are identified as the important features of highly cited papers by only two methods, Ridge and Lasso. Notice that we use specific machine learning algorithms as feature selection methods (Ridge, Lasso, and Boruta) to identify the most important features of highly cited papers, tools that had not previously been used for this purpose. In conclusion, we re-emphasize the performance resulting from such algorithms. Moreover, we do not advise authors to seek to increase the citations of their articles by manipulating the identified performance features. Indeed, ethical rules regarding these characteristics must be strictly obeyed.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β physics.soc-ph
π
π
The Cartographer
R.I.P.
π»
Ghosted
Networks beyond pairwise interactions: structure and dynamics
R.I.P.
π»
Ghosted
Statistical physics of human cooperation
R.I.P.
π»
Ghosted
Vital nodes identification in complex networks
R.I.P.
π»
Ghosted
Influence maximization in complex networks through optimal percolation
R.I.P.
π»
Ghosted
Scale-free networks are rare
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted