Boosting Source Code Learning with Text-Oriented Data Augmentation: An Empirical Study
March 13, 2023 Β· Declared Dead Β· π IEEE International Conference on Software Quality, Reliability and Security Companion
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Zeming Dong, Qiang Hu, Yuejun Guo, Zhenya Zhang, Maxime Cordy, Mike Papadakis, Yves Le Traon, Jianjun Zhao
arXiv ID
2303.06808
Category
cs.SE: Software Engineering
Cross-listed
cs.AI
Citations
10
Venue
IEEE International Conference on Software Quality, Reliability and Security Companion
Last Checked
4 months ago
Abstract
Recent studies have demonstrated remarkable advancements in source code learning, which applies deep neural networks (DNNs) to tackle various software engineering tasks. Similar to other DNN-based domains, source code learning also requires massive high-quality training data to achieve the success of these applications. Data augmentation, a technique used to produce additional training data, is widely adopted in other domains (e.g. computer vision). However, the existing practice of data augmentation in source code learning is limited to simple syntax-preserved methods, such as code refactoring. In this paper, considering that source code can also be represented as text data, we take an early step to investigate the effectiveness of data augmentation methods originally designed for natural language texts in the context of source code learning. To this end, we focus on code classification tasks and conduct a comprehensive empirical study across four critical code problems and four DNN architectures to assess the effectiveness of 25 data augmentation methods. Our results reveal specific data augmentation methods that yield more accurate and robust models for source code learning. Additionally, we discover that the data augmentation methods remain beneficial even when they slightly break source code syntax.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Software Engineering
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Microservices: yesterday, today, and tomorrow
π
π
The Cartographer
A Survey of Machine Learning for Big Code and Naturalness
R.I.P.
π»
Ghosted
An Overview on Smart Contracts: Challenges, Advances and Platforms
R.I.P.
π»
Ghosted
Slither: A Static Analysis Framework For Smart Contracts
R.I.P.
π»
Ghosted
ContractFuzzer: Fuzzing Smart Contracts for Vulnerability Detection
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted