Data Smells: Categories, Causes and Consequences, and Detection of Suspicious Data in AI-based Systems

March 19, 2022 Β· Declared Dead Β· πŸ› 2022 IEEE/ACM 1st International Conference on AI Engineering – Software Engineering for AI (CAIN)

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Harald Foidl, Michael Felderer, Rudolf Ramler arXiv ID 2203.10384 Category cs.SE: Software Engineering Cross-listed cs.AI Citations 44 Venue 2022 IEEE/ACM 1st International Conference on AI Engineering – Software Engineering for AI (CAIN) Last Checked 4 months ago
Abstract
High data quality is fundamental for today's AI-based systems. However, although data quality has been an object of research for decades, there is a clear lack of research on potential data quality issues (e.g., ambiguous, extraneous values). These kinds of issues are latent in nature and thus often not obvious. Nevertheless, they can be associated with an increased risk of future problems in AI-based systems (e.g., technical debt, data-induced faults). As a counterpart to code smells in software engineering, we refer to such issues as Data Smells. This article conceptualizes data smells and elaborates on their causes, consequences, detection, and use in the context of AI-based systems. In addition, a catalogue of 36 data smells divided into three categories (i.e., Believability Smells, Understandability Smells, Consistency Smells) is presented. Moreover, the article outlines tool support for detecting data smells and presents the result of an initial smell detection on more than 240 real-world datasets.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Software Engineering

Died the same way β€” πŸ‘» Ghosted