Towards Speaker Identification with Minimal Dataset and Constrained Resources using 1D-Convolution Neural Network
November 22, 2024 ยท Entered Twilight ยท ๐ arXiv.org
Repo contents: .gitattributes, .gitignore, .idea, LICENSE, README.md, __pycache__, app.py, audio_slicer.py, data, empty_plot.png, eval_metrics, generate_plot.py, guiapp.py, guiapp.spec, main_ui.py, model.h5, model_logs, models_saves, pyqt5_ui, requirements.txt, resampler.py, saved_variable.joblib, slicer.py, testing_stream_app.py, train_model.py, voice-recognition.ico, waveform_fft_output.png
Authors
Irfan Nafiz Shahan, Pulok Ahmed Auvi
arXiv ID
2411.15082
Category
cs.SD: Sound
Cross-listed
cs.AI,
cs.LG,
eess.AS
Citations
1
Venue
arXiv.org
Repository
https://github.com/IrfanNafiz/RecMe
โญ 4
Last Checked
3 months ago
Abstract
Voice recognition and speaker identification are vital for applications in security and personal assistants. This paper presents a lightweight 1D-Convolutional Neural Network (1D-CNN) designed to perform speaker identification on minimal datasets. Our approach achieves a validation accuracy of 97.87%, leveraging data augmentation techniques to handle background noise and limited training samples. Future improvements include testing on larger datasets and integrating transfer learning methods to enhance generalizability. We provide all code, the custom dataset, and the trained models to facilitate reproducibility. These resources are available on our GitHub repository: https://github.com/IrfanNafiz/RecMe.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Sound
๐ฎ
๐ฎ
The Ethereal
R.I.P.
๐ป
Ghosted
Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks
R.I.P.
๐ป
Ghosted
The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines
R.I.P.
๐ป
Ghosted
TasNet: time-domain audio separation network for real-time, single-channel speech separation
R.I.P.
๐ป
Ghosted
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
R.I.P.
๐ป
Ghosted