Detecting the presence of sperm whales echolocation clicks in noisy environments
December 31, 2023 Β· Declared Dead Β· π IEEE/ACM Transactions on Audio Speech and Language Processing
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Guy Gubnitsky, Roee Diamant
arXiv ID
2401.00900
Category
eess.AS: Audio & Speech
Cross-listed
cs.LG,
cs.SD
Citations
9
Venue
IEEE/ACM Transactions on Audio Speech and Language Processing
Last Checked
3 months ago
Abstract
Sperm whales (Physeter macrocephalus) navigate underwater with a series of impulsive, click-like sounds known as echolocation clicks. These clicks are characterized by a multipulse structure (MPS) that serves as a distinctive pattern. In this work, we use the stability of the MPS as a detection metric for recognizing and classifying the presence of clicks in noisy environments. To distinguish between noise transients and to handle simultaneous emissions from multiple sperm whales, our approach clusters a time series of MPS measures while removing potential clicks that do not fulfil the limits of inter-click interval, duration and spectrum. As a result, our approach can handle high noise transients and low signal-to-noise ratio. The performance of our detection approach is examined using three datasets: seven months of recordings from the Mediterranean Sea containing manually verified ambient noise; several days of manually labelled data collected from the Dominica Island containing approximately 40,000 clicks from multiple sperm whales; and a dataset from the Bahamas containing 1,203 labelled clicks from a single sperm whale. Comparing with the results of two benchmark detectors, a better trade-off between precision and recall is observed as well as a significant reduction in false detection rates, especially in noisy environments. To ensure reproducibility, we provide our database of labelled clicks along with our implementation code.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Audio & Speech
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
LPCNet: Improving Neural Speech Synthesis Through Linear Prediction
R.I.P.
π»
Ghosted
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
R.I.P.
π»
Ghosted
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech
R.I.P.
π»
Ghosted
Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
R.I.P.
π»
Ghosted
Utterance-level Aggregation For Speaker Recognition In The Wild
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted