Investigating Acoustic-Textual Emotional Inconsistency Information for Automatic Depression Detection

December 09, 2024 · Declared Dead · 🏛 arXiv.org

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Rongfeng Su, Changqing Xu, Xinyi Wu, Feng Xu, Xie Chen, Lan Wangt, Nan Yan arXiv ID 2412.18614 Category eess.AS: Audio & Speech Cross-listed cs.AI, cs.CL Citations 1 Venue arXiv.org Last Checked 3 months ago

Abstract

Previous studies have demonstrated that emotional features from a single acoustic sentiment label can enhance depression diagnosis accuracy. Additionally, according to the Emotion Context-Insensitivity theory and our pilot study, individuals with depression might convey negative emotional content in an unexpectedly calm manner, showing a high degree of inconsistency in emotional expressions during natural conversations. So far, few studies have recognized and leveraged the emotional expression inconsistency for depression detection. In this paper, a multimodal cross-attention method is presented to capture the Acoustic-Textual Emotional Inconsistency (ATEI) information. This is achieved by analyzing the intricate local and long-term dependencies of emotional expressions across acoustic and textual domains, as well as the mismatch between the emotional content within both domains. A Transformer-based model is then proposed to integrate this ATEI information with various fusion strategies for detecting depression. Furthermore, a scaling technique is employed to adjust the ATEI feature degree during the fusion process, thereby enhancing the model's ability to discern patients with depression across varying levels of severity. To best of our knowledge, this work is the first to incorporate emotional expression inconsistency information into depression detection. Experimental results on a counseling conversational dataset illustrate the effectiveness of our method.