Combining High-Level Features of Raw Audio Waves and Mel-Spectrograms for Audio Tagging

November 26, 2018 ยท Declared Dead ยท ๐Ÿ› Workshop on Detection and Classification of Acoustic Scenes and Events

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Marcel Lederle, Benjamin Wilhelm arXiv ID 1811.10708 Category cs.SD: Sound Cross-listed cs.LG, eess.AS Citations 10 Venue Workshop on Detection and Classification of Acoustic Scenes and Events Last Checked 3 months ago
Abstract
In this paper, we describe our contribution to Task 2 of the DCASE 2018 Audio Challenge. While it has become ubiquitous to utilize an ensemble of machine learning methods for classification tasks to obtain better predictive performance, the majority of ensemble methods combine predictions rather than learned features. We propose a single-model method that combines learned high-level features computed from log-scaled mel-spectrograms and raw audio data. These features are learned separately by two Convolutional Neural Networks, one for each input type, and then combined by densely connected layers within a single network. This relatively simple approach along with data augmentation ranks among the best two percent in the Freesound General-Purpose Audio Tagging Challenge on Kaggle.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Sound

Died the same way โ€” ๐Ÿ‘ป Ghosted