Exploring attention mechanism for acoustic-based classification of speech utterances into system-directed and non-system-directed
February 01, 2019 Β· Declared Dead Β· π IEEE International Conference on Acoustics, Speech, and Signal Processing
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Atta Norouzian, Bogdan Mazoure, Dermot Connolly, Daniel Willett
arXiv ID
1902.00570
Category
cs.HC: Human-Computer Interaction
Cross-listed
cs.CL,
eess.AS
Citations
29
Venue
IEEE International Conference on Acoustics, Speech, and Signal Processing
Last Checked
4 months ago
Abstract
Voice controlled virtual assistants (VAs) are now available in smartphones, cars, and standalone devices in homes. In most cases, the user needs to first "wake-up" the VA by saying a particular word/phrase every time he or she wants the VA to do something. Eliminating the need for saying the wake-up word for every interaction could improve the user experience. This would require the VA to have the capability to detect the speech that is being directed at it and respond accordingly. In other words, the challenge is to distinguish between system-directed and non-system-directed speech utterances. In this paper, we present a number of neural network architectures for tackling this classification problem based on using only acoustic features. These architectures are based on using convolutional, recurrent and feed-forward layers. In addition, we investigate the use of an attention mechanism applied to the output of the convolutional and the recurrent layers. It is shown that incorporating the proposed attention mechanism into the models always leads to significant improvement in classification accuracy. The best model achieved equal error rates of 16.25 and 15.62 percents on two distinct realistic datasets.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Human-Computer Interaction
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Improving fairness in machine learning systems: What do industry practitioners need?
R.I.P.
π»
Ghosted
Identifying Stable Patterns over Time for Emotion Recognition from EEG
R.I.P.
π»
Ghosted
Questioning the AI: Informing Design Practices for Explainable AI User Experiences
R.I.P.
π»
Ghosted
Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges and Opportunities
R.I.P.
π»
Ghosted
Educational data mining and learning analytics: An updated survey
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted