๐
๐
Old Age
Understanding the Role of Mixup in Knowledge Distillation: An Empirical Study
November 08, 2022 ยท Entered Twilight ยท ๐ IEEE Workshop/Winter Conference on Applications of Computer Vision
Repo contents: README.md, dataset, distiller_zoo, helper, models, result, train_student_MIXSTD.py
Authors
Hongjun Choi, Eun Som Jeon, Ankita Shukla, Pavan Turaga
arXiv ID
2211.03946
Category
cs.CV: Computer Vision
Citations
9
Venue
IEEE Workshop/Winter Conference on Applications of Computer Vision
Repository
https://github.com/hchoi71/MIX-KD
โญ 3
Last Checked
2 months ago
Abstract
Mixup is a popular data augmentation technique based on creating new samples by linear interpolation between two given data samples, to improve both the generalization and robustness of the trained model. Knowledge distillation (KD), on the other hand, is widely used for model compression and transfer learning, which involves using a larger network's implicit knowledge to guide the learning of a smaller network. At first glance, these two techniques seem very different, however, we found that "smoothness" is the connecting link between the two and is also a crucial attribute in understanding KD's interplay with mixup. Although many mixup variants and distillation methods have been proposed, much remains to be understood regarding the role of a mixup in knowledge distillation. In this paper, we present a detailed empirical study on various important dimensions of compatibility between mixup and knowledge distillation. We also scrutinize the behavior of the networks trained with a mixup in the light of knowledge distillation through extensive analysis, visualizations, and comprehensive experiments on image classification. Finally, based on our findings, we suggest improved strategies to guide the student network to enhance its effectiveness. Additionally, the findings of this study provide insightful suggestions to researchers and practitioners that commonly use techniques from KD. Our code is available at https://github.com/hchoi71/MIX-KD.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted