R.I.P.
๐ป
Ghosted
M3SD: Multi-modal, Multi-scenario and Multi-language Speaker Diarization Dataset
June 17, 2025 ยท Declared Dead ยท ๐ arXiv.org
Authors
Shilong Wu
arXiv ID
2506.14427
Category
eess.AS: Audio & Speech
Cross-listed
cs.MM
Citations
0
Venue
arXiv.org
Repository
https://huggingface.co/spaces/OldDragon/m3sd
Last Checked
3 months ago
Abstract
In the field of speaker diarization, the development of technology is constrained by two problems: insufficient data resources and poor generalization ability of deep learning models. To address these two problems, firstly, we propose an automated method for constructing speaker diarization datasets, which generates more accurate pseudo-labels for massive data through the combination of audio and video. Relying on this method, we have released Multi-modal, Multi-scenario and Multi-language Speaker Diarization (M3SD) datasets. This dataset is derived from real network videos and is highly diverse. Our dataset and code have been open-sourced at https://huggingface.co/spaces/OldDragon/m3sd.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Audio & Speech
R.I.P.
๐ป
Ghosted
LPCNet: Improving Neural Speech Synthesis Through Linear Prediction
R.I.P.
๐ป
Ghosted
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
R.I.P.
๐ป
Ghosted
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech
R.I.P.
๐ป
Ghosted
Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
R.I.P.
๐ป
Ghosted
Utterance-level Aggregation For Speaker Recognition In The Wild
Died the same way โ ๐ 404 Not Found
R.I.P.
๐
404 Not Found
Deep High-Resolution Representation Learning for Visual Recognition
R.I.P.
๐
404 Not Found
HuggingFace's Transformers: State-of-the-art Natural Language Processing
R.I.P.
๐
404 Not Found
CCNet: Criss-Cross Attention for Semantic Segmentation
R.I.P.
๐
404 Not Found