Artificial Neural Networks to Recognize Speakers Division from Continuous Bengali Speech
April 18, 2024 Β· Declared Dead Β· π arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Hasmot Ali, Md. Fahad Hossain, Md. Mehedi Hasan, Sheikh Abujar, Sheak Rashed Haider Noori
arXiv ID
2404.15168
Category
eess.AS: Audio & Speech
Cross-listed
cs.HC,
cs.LG,
cs.SD
Citations
1
Venue
arXiv.org
Last Checked
3 months ago
Abstract
Voice based applications are ruling over the era of automation because speech has a lot of factors that determine a speakers information as well as speech. Modern Automatic Speech Recognition (ASR) is a blessing in the field of Human-Computer Interaction (HCI) for efficient communication among humans and devices using Artificial Intelligence technology. Speech is one of the easiest mediums of communication because it has a lot of identical features for different speakers. Nowadays it is possible to determine speakers and their identity using their speech in terms of speaker recognition. In this paper, we presented a method that will provide a speakers geographical identity in a certain region using continuous Bengali speech. We consider eight different divisions of Bangladesh as the geographical region. We applied the Mel Frequency Cepstral Coefficient (MFCC) and Delta features on an Artificial Neural Network to classify speakers division. We performed some preprocessing tasks like noise reduction and 8-10 second segmentation of raw audio before feature extraction. We used our dataset of more than 45 hours of audio data from 633 individual male and female speakers. We recorded the highest accuracy of 85.44%.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Audio & Speech
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
LPCNet: Improving Neural Speech Synthesis Through Linear Prediction
R.I.P.
π»
Ghosted
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
R.I.P.
π»
Ghosted
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech
R.I.P.
π»
Ghosted
Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
R.I.P.
π»
Ghosted
Utterance-level Aggregation For Speaker Recognition In The Wild
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted