Stop Words for Processing Software Engineering Documents: Do they Matter?

March 18, 2023 Β· Declared Dead Β· πŸ› 2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE)

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Yaohou Fan, Chetan Arora, Christoph Treude arXiv ID 2303.10439 Category cs.SE: Software Engineering Cross-listed cs.CL Citations 9 Venue 2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE) Last Checked 4 months ago
Abstract
Stop words, which are considered non-predictive, are often eliminated in natural language processing tasks. However, the definition of uninformative vocabulary is vague, so most algorithms use general knowledge-based stop lists to remove stop words. There is an ongoing debate among academics about the usefulness of stop word elimination, especially in domain-specific settings. In this work, we investigate the usefulness of stop word removal in a software engineering context. To do this, we replicate and experiment with three software engineering research tools from related work. Additionally, we construct a corpus of software engineering domain-related text from 10,000 Stack Overflow questions and identify 200 domain-specific stop words using traditional information-theoretic methods. Our results show that the use of domain-specific stop words significantly improved the performance of research tools compared to the use of a general stop list and that 17 out of 19 evaluation measures showed better performance. Online appendix: https://zenodo.org/record/7865748
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Software Engineering

Died the same way β€” πŸ‘» Ghosted