Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models
June 05, 2023 ยท Entered Twilight ยท ๐ Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Repo contents: README.md, config.yaml, demo.py, finetune, gen_ood_mix.py, get_aug_image.py, merge_layout.py, mixup_image.py, models, preprocess, test, utils
Authors
Jiabang He, Yi Hu, Lei Wang, Xing Xu, Ning Liu, Hui Liu, Heng Tao Shen
arXiv ID
2306.02623
Category
cs.CV: Computer Vision
Cross-listed
cs.CL,
cs.MM
Citations
6
Venue
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Repository
https://github.com/MAEHCM/Do-GOOD
โญ 6
Last Checked
2 months ago
Abstract
Numerous pre-training techniques for visual document understanding (VDU) have recently shown substantial improvements in performance across a wide range of document tasks. However, these pre-trained VDU models cannot guarantee continued success when the distribution of test data differs from the distribution of training data. In this paper, to investigate how robust existing pre-trained VDU models are to various distribution shifts, we first develop an out-of-distribution (OOD) benchmark termed Do-GOOD for the fine-Grained analysis on Document image-related tasks specifically. The Do-GOOD benchmark defines the underlying mechanisms that result in different distribution shifts and contains 9 OOD datasets covering 3 VDU related tasks, e.g., document information extraction, classification and question answering. We then evaluate the robustness and perform a fine-grained analysis of 5 latest VDU pre-trained models and 2 typical OOD generalization algorithms on these OOD datasets. Results from the experiments demonstrate that there is a significant performance gap between the in-distribution (ID) and OOD settings for document images, and that fine-grained analysis of distribution shifts can reveal the brittle nature of existing pre-trained VDU models and OOD generalization algorithms. The code and datasets for our Do-GOOD benchmark can be found at https://github.com/MAEHCM/Do-GOOD.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted