Device-Directed Speech Detection for Follow-up Conversations Using Large Language Models

October 28, 2024 Β· Declared Dead Β· πŸ› arXiv.org

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Ognjen, Rudovic, Pranay Dighe, Yi Su, Vineet Garg, Sameer Dharur, Xiaochuan Niu, Ahmed H. Abdelaziz, Saurabh Adya, Ahmed Tewfik arXiv ID 2411.00023 Category eess.AS: Audio & Speech Cross-listed cs.AI, cs.CL, cs.SD Citations 0 Venue arXiv.org Last Checked 3 months ago
Abstract
Follow-up conversations with virtual assistants (VAs) enable a user to seamlessly interact with a VA without the need to repeatedly invoke it using a keyword (after the first query). Therefore, accurate Device-directed Speech Detection (DDSD) from the follow-up queries is critical for enabling naturalistic user experience. To this end, we explore the notion of Large Language Models (LLMs) and model the first query when making inference about the follow-ups (based on the ASR-decoded text), via prompting of a pretrained LLM, or by adapting a binary classifier on top of the LLM. In doing so, we also exploit the ASR uncertainty when designing the LLM prompts. We show on the real-world dataset of follow-up conversations that this approach yields large gains (20-40% reduction in false alarms at 10% fixed false rejects) due to the joint modeling of the previous speech context and ASR uncertainty, compared to when follow-ups are modeled alone.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Audio & Speech

Died the same way β€” πŸ‘» Ghosted