5th Place Solution to Kaggle Google Universal Image Embedding Competition

October 18, 2022 · Entered Twilight · 🏛 arXiv.org

Repo contents: .gitignore, LICENSE, README.md, directory_structure.txt, make_torchscript_wcls.py, models, preprocess_GPR1200.py, preprocess_food101.py, preprocess_glr2021_products10k.py, requirements.txt, src, tfrec2png.py, train_arcface.py, train_classifier.py

Authors Noriaki Ota, Shingo Yokoi, Shinsuke Yamaoka arXiv ID 2210.09495 Category cs.CV: Computer Vision Citations 2 Venue arXiv.org Repository https://github.com/riron1206/kaggle-Google-Universal-Image-Embedding-Competition-5th-Place-Solution ⭐ 7 Last Checked 3 months ago

Abstract

In this paper, we present our solution, which placed 5th in the kaggle Google Universal Image Embedding Competition in 2022. We use the ViT-H visual encoder of CLIP from the openclip repository as a backbone and train a head model composed of BatchNormalization and Linear layers using ArcFace. The dataset used was a subset of products10K, GLDv2, GPR1200, and Food101. And applying TTA for part of images also improves the score. With this method, we achieve a score of 0.684 on the public and 0.688 on the private leaderboard. Our code is available. https://github.com/riron1206/kaggle-Google-Universal-Image-Embedding-Competition-5th-Place-Solution