Real-Time Word-Level Temporal Segmentation in Streaming Speech Recognition

April 15, 2025 Β· Declared Dead Β· πŸ› ACM Symposium on User Interface Software and Technology

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Naoto Nishida, Hirotaka Hiraki, Jun Rekimoto, Yoshio Ishiguro arXiv ID 2504.10849 Category cs.HC: Human-Computer Interaction Cross-listed cs.MM, cs.SD, eess.AS Citations 0 Venue ACM Symposium on User Interface Software and Technology Last Checked 4 months ago
Abstract
Rich-text captions are essential to help communication for Deaf and hard-of-hearing (DHH) people, second-language learners, and those with autism spectrum disorder (ASD). They also preserve nuances when converting speech to text, enhancing the realism of presentation scripts and conversation or speech logs. However, current real-time captioning systems lack the capability to alter text attributes (ex. capitalization, sizes, and fonts) at the word level, hindering the accurate conveyance of speaker intent that is expressed in the tones or intonations of the speech. For example, ''YOU should do this'' tends to be considered as indicating ''You'' as the focus of the sentence, whereas ''You should do THIS'' tends to be ''This'' as the focus. This paper proposes a solution that changes the text decorations at the word level in real time. As a prototype, we developed an application that adjusts word size based on the loudness of each spoken word. Feedback from users implies that this system helped to convey the speaker's intent, offering a more engaging and accessible captioning experience.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Human-Computer Interaction

Died the same way β€” πŸ‘» Ghosted