Rethinking Noisy Label Learning in Real-world Annotation Scenarios from the Noise-type Perspective

July 28, 2023 ยท Entered Twilight ยท ๐Ÿ› arXiv.org

๐Ÿ’ค TWILIGHT: Eternal Rest
Repo abandoned since publication

Repo contents: .DS_Store, README.md, code, data, requirements.txt

Authors Renyu Zhu, Haoyu Liu, Runze Wu, Minmin Lin, Tangjie Lv, Changjie Fan, Haobo Wang arXiv ID 2307.16889 Category cs.LG: Machine Learning Cross-listed cs.AI, cs.HC Citations 2 Venue arXiv.org Repository https://github.com/fuxiAIlab/ProtoSemi โญ 6 Last Checked 3 months ago
Abstract
In this paper, we investigate the problem of learning with noisy labels in real-world annotation scenarios, where noise can be categorized into two types: factual noise and ambiguity noise. To better distinguish these noise types and utilize their semantics, we propose a novel sample selection-based approach for noisy label learning, called Proto-semi. Proto-semi initially divides all samples into the confident and unconfident datasets via warm-up. By leveraging the confident dataset, prototype vectors are constructed to capture class characteristics. Subsequently, the distances between the unconfident samples and the prototype vectors are calculated to facilitate noise classification. Based on these distances, the labels are either corrected or retained, resulting in the refinement of the confident and unconfident datasets. Finally, we introduce a semi-supervised learning method to enhance training. Empirical evaluations on a real-world annotated dataset substantiate the robustness of Proto-semi in handling the problem of learning from noisy labels. Meanwhile, the prototype-based repartitioning strategy is shown to be effective in mitigating the adverse impact of label noise. Our code and data are available at https://github.com/fuxiAIlab/ProtoSemi.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning