Large Language Models Meet Computer Vision: A Brief Survey

November 28, 2023 · The Cartographer · 🏛 arXiv.org

"No code URL or promise found in abstract"
"Title-pattern auto-detect: Large Language Models Meet Computer Vision: A Brief Survey"

Evidence collected by the PWNC Scanner

Authors Raby Hamadi arXiv ID 2311.16673 Category cs.CV: Computer Vision Cross-listed cs.AI Citations 6 Venue arXiv.org Last Checked 3 days ago

Abstract

Recently, the intersection of Large Language Models (LLMs) and Computer Vision (CV) has emerged as a pivotal area of research, driving significant advancements in the field of Artificial Intelligence (AI). As transformers have become the backbone of many state-of-the-art models in both Natural Language Processing (NLP) and CV, understanding their evolution and potential enhancements is crucial. This survey paper delves into the latest progressions in the domain of transformers and their subsequent successors, emphasizing their potential to revolutionize Vision Transformers (ViTs) and LLMs. This survey also presents a comparative analysis, juxtaposing the performance metrics of several leading paid and open-source LLMs, shedding light on their strengths and areas of improvement as well as a literature review on how LLMs are being used to tackle vision related tasks. Furthermore, the survey presents a comprehensive collection of datasets employed to train LLMs, offering insights into the diverse data available to achieve high performance in various pre-training and downstream tasks of LLMs. The survey is concluded by highlighting open directions in the field, suggesting potential venues for future research and development. This survey aims to underscores the profound intersection of LLMs on CV, leading to a new era of integrated and advanced AI models.