๐ฎ
๐ฎ
The Ethereal
Detecting Syllable-Level Pronunciation Stress with A Self-Attention Model
November 01, 2023 ยท Entered Twilight ยท ๐ arXiv.org
Repo contents: README.md, _audios, data collection.ipynb, data for training and testing, models trained, pronunciation dictionary, self_attention_all_features.py, self_attention_numerical.py, training and testing.ipynb
Authors
Wang Weiying, Nakajima Akinori
arXiv ID
2311.00301
Category
cs.SD: Sound
Cross-listed
cs.CL,
eess.AS
Citations
0
Venue
arXiv.org
Repository
https://github.com/wangweiying303/stress-detection-model
โญ 8
Last Checked
3 months ago
Abstract
One precondition of effective oral communication is that words should be pronounced clearly, especially for non-native speakers. Word stress is the key to clear and correct English, and misplacement of syllable stress may lead to misunderstandings. Thus, knowing the stress level is important for English speakers and learners. This paper presents a self-attention model to identify the stress level for each syllable of spoken English. Various prosodic and categorical features, including the pitch level, intensity, duration and type of the syllable and its nuclei (the vowel of the syllable), are explored. These features are input to the self-attention model, and syllable-level stresses are predicted. The simplest model yields an accuracy of over 88% and 93% on different datasets, while more advanced models provide higher accuracy. Our study suggests that the self-attention model can be promising in stress-level detection. These models could be applied to various scenarios, such as online meetings and English learning.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Sound
R.I.P.
๐ป
Ghosted
Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks
R.I.P.
๐ป
Ghosted
The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines
R.I.P.
๐ป
Ghosted
TasNet: time-domain audio separation network for real-time, single-channel speech separation
R.I.P.
๐ป
Ghosted
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
R.I.P.
๐ป
Ghosted