Towards Speaker Identification with Minimal Dataset and Constrained Resources using 1D-Convolution Neural Network

November 22, 2024 ยท Entered Twilight ยท ๐Ÿ› arXiv.org

๐Ÿ’ค TWILIGHT: Eternal Rest
Repo abandoned since publication

Repo contents: .gitattributes, .gitignore, .idea, LICENSE, README.md, __pycache__, app.py, audio_slicer.py, data, empty_plot.png, eval_metrics, generate_plot.py, guiapp.py, guiapp.spec, main_ui.py, model.h5, model_logs, models_saves, pyqt5_ui, requirements.txt, resampler.py, saved_variable.joblib, slicer.py, testing_stream_app.py, train_model.py, voice-recognition.ico, waveform_fft_output.png

Authors Irfan Nafiz Shahan, Pulok Ahmed Auvi arXiv ID 2411.15082 Category cs.SD: Sound Cross-listed cs.AI, cs.LG, eess.AS Citations 1 Venue arXiv.org Repository https://github.com/IrfanNafiz/RecMe โญ 4 Last Checked 3 months ago
Abstract
Voice recognition and speaker identification are vital for applications in security and personal assistants. This paper presents a lightweight 1D-Convolutional Neural Network (1D-CNN) designed to perform speaker identification on minimal datasets. Our approach achieves a validation accuracy of 97.87%, leveraging data augmentation techniques to handle background noise and limited training samples. Future improvements include testing on larger datasets and integrating transfer learning methods to enhance generalizability. We provide all code, the custom dataset, and the trained models to facilitate reproducibility. These resources are available on our GitHub repository: https://github.com/IrfanNafiz/RecMe.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Sound