R.I.P.
๐ป
Ghosted
Ontology-aware Learning and Evaluation for Audio Tagging
November 22, 2022 ยท Entered Twilight ยท ๐ Interspeech
Repo contents: .gitignore, README.md, ontology_audio_tagging, setup.py, test
Authors
Haohe Liu, Qiuqiang Kong, Xubo Liu, Xinhao Mei, Wenwu Wang, Mark D. Plumbley
arXiv ID
2211.12195
Category
eess.AS: Audio & Speech
Cross-listed
cs.AI,
cs.LG,
cs.SD,
eess.SP
Citations
6
Venue
Interspeech
Repository
https://github.com/haoheliu/ontology-aware-audio-tagging
โญ 13
Last Checked
1 month ago
Abstract
This study defines a new evaluation metric for audio tagging tasks to overcome the limitation of the conventional mean average precision (mAP) metric, which treats different kinds of sound as independent classes without considering their relations. Also, due to the ambiguities in sound labeling, the labels in the training and evaluation set are not guaranteed to be accurate and exhaustive, which poses challenges for robust evaluation with mAP. The proposed metric, ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation. Specifically, we reweight the false positive events in the model prediction based on the ontology graph distance to the target classes. The OmAP measure also provides more insights into model performance by evaluations with different coarse-grained levels in the ontology graph. We conduct human evaluations and demonstrate that OmAP is more consistent with human perception than mAP. To further verify the importance of utilizing the ontology information, we also propose a novel loss function (OBCE) that reweights binary cross entropy (BCE) loss based on the ontology distance. Our experiment shows that OBCE can improve both mAP and OmAP metrics on the AudioSet tagging task.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Audio & Speech
R.I.P.
๐ป
Ghosted
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
R.I.P.
๐ป
Ghosted
DiffWave: A Versatile Diffusion Model for Audio Synthesis
R.I.P.
๐ป
Ghosted
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
R.I.P.
๐ป
Ghosted
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
R.I.P.
๐ป
Ghosted