Revisiting Process versus Product Metrics: a Large Scale Analysis

August 21, 2020 · Declared Dead · 🏛 Empirical Software Engineering

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Suvodeep Majumder, Pranav Mody, Tim Menzies arXiv ID 2008.09569 Category cs.SE: Software Engineering Cross-listed cs.LG Citations 22 Venue Empirical Software Engineering Last Checked 4 months ago

Abstract

Numerous methods can build predictive models from software data. However, what methods and conclusions should we endorse as we move from analytics in-the-small (dealing with a handful of projects) to analytics in-the-large (dealing with hundreds of projects)? To answer this question, we recheck prior small-scale results (about process versus product metrics for defect prediction and the granularity of metrics) using 722,471 commits from 700 Github projects. We find that some analytics in-the-small conclusions still hold when scaling up to analytics in-the-large. For example, like prior work, we see that process metrics are better predictors for defects than product metrics (best process/product-based learners respectively achieve recalls of 98\%/44\% and AUCs of 95\%/54\%, median values). That said, we warn that it is unwise to trust metric importance results from analytics in-the-small studies since those change dramatically when moving to analytics in-the-large. Also, when reasoning in-the-large about hundreds of projects, it is better to use predictions from multiple models (since single model predictions can become confused and exhibit a high variance).

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Software Engineering

R.I.P. 👻 Ghosted

DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars

Yuchi Tian, Kexin Pei, ... (+2 more)

cs.SE 🏛 ICSE 📚 1.4K cites 8 years ago

R.I.P. 👻 Ghosted

Microservices: yesterday, today, and tomorrow

Nicola Dragoni, Saverio Giallorenzo, ... (+5 more)

cs.SE 🏛 Present and Ulterior Software Engineering 📚 1.1K cites 10 years ago

📚 📚 The Cartographer

A Survey of Machine Learning for Big Code and Naturalness

Miltiadis Allamanis, Earl T. Barr, ... (+2 more)

cs.SE 🏛 ACM CSUR 📚 962 cites 8 years ago

R.I.P. 👻 Ghosted

An Overview on Smart Contracts: Challenges, Advances and Platforms

Zibin Zheng, Shaoan Xie, ... (+5 more)

cs.SE 🏛 FGCS 📚 917 cites 6 years ago

R.I.P. 👻 Ghosted

Slither: A Static Analysis Framework For Smart Contracts

Josselin Feist, Gustavo Grieco, Alex Groce

cs.SE 🏛 ICETSEB W 📚 823 cites 6 years ago

R.I.P. 👻 Ghosted

ContractFuzzer: Fuzzing Smart Contracts for Vulnerability Detection

Bo Jiang, Ye Liu, W. K. Chan

cs.SE 🏛 ASE 📚 790 cites 7 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 9 years ago