A survey of open-source data quality tools: shedding light on the materialization of data quality dimensions in practice
July 26, 2024 ยท The Cartographer ยท ๐ arXiv.org
"No code URL or promise found in abstract"
"Title-pattern auto-detect: A survey of open-source data quality tools: shedding light on the materialization of data quality di"
Evidence collected by the PWNC Scanner
Authors
Vasileios Papastergios, Anastasios Gounaris
arXiv ID
2407.18649
Category
cs.DB: Databases
Citations
5
Venue
arXiv.org
Last Checked
3 days ago
Abstract
Data Quality (DQ) describes the degree to which data characteristics meet requirements and are fit for use by humans and/or systems. There are several aspects in which DQ can be measured, called DQ dimensions (i.e. accuracy, completeness, consistency, etc.), also referred to as characteristics in literature. ISO/IEC 25012 Standard defines a data quality model with fifteen such dimensions, setting the requirements a data product should meet. In this short report, we aim to bridge the gap between lower-level functionalities offered by DQ tools and higher-level dimensions in a systematic manner, revealing the many-to-many relationships between them. To this end, we examine 6 open-source DQ tools and we emphasize on providing a mapping between the functionalities they offer and the DQ dimensions, as defined by the ISO standard. Wherever applicable, we also provide insights into the software engineering details that tools leverage, in order to address DQ challenges.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Databases
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
Untangling Blockchain: A Data Processing View of Blockchain Systems
R.I.P.
๐ป
Ghosted
Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades
R.I.P.
๐ป
Ghosted
BLOCKBENCH: A Framework for Analyzing Private Blockchains
R.I.P.
๐ป
Ghosted
Data Synthesis based on Generative Adversarial Networks
R.I.P.
๐ป
Ghosted