Fully-attentive and interpretable: vision and video vision transformers for pain detection

October 27, 2022 ยท Entered Twilight ยท ๐Ÿ› arXiv.org

๐Ÿ’ค TWILIGHT: Eternal Rest
Repo abandoned since publication

Repo contents: Neurips_Transformer_Code_Final.ipynb

Authors Giacomo Fiorentini, Itir Onal Ertugrul, Albert Ali Salah arXiv ID 2210.15769 Category cs.CV: Computer Vision Citations 6 Venue arXiv.org Repository https://github.com/IPDTFE/ViT-McMaster โญ 2 Last Checked 3 months ago
Abstract
Pain is a serious and costly issue globally, but to be treated, it must first be detected. Vision transformers are a top-performing architecture in computer vision, with little research on their use for pain detection. In this paper, we propose the first fully-attentive automated pain detection pipeline that achieves state-of-the-art performance on binary pain detection from facial expressions. The model is trained on the UNBC-McMaster dataset, after faces are 3D-registered and rotated to the canonical frontal view. In our experiments we identify important areas of the hyperparameter space and their interaction with vision and video vision transformers, obtaining 3 noteworthy models. We analyse the attention maps of one of our models, finding reasonable interpretations for its predictions. We also evaluate Mixup, an augmentation technique, and Sharpness-Aware Minimization, an optimizer, with no success. Our presented models, ViT-1 (F1 score 0.55 +- 0.15), ViViT-1 (F1 score 0.55 +- 0.13), and ViViT-2 (F1 score 0.49 +- 0.04), all outperform earlier works, showing the potential of vision transformers for pain detection. Code is available at https://github.com/IPDTFE/ViT-McMaster
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computer Vision

๐ŸŒ… ๐ŸŒ… Old Age

Fast R-CNN

Ross Girshick

cs.CV ๐Ÿ› ICCV ๐Ÿ“š 27.7K cites 11 years ago