Probing Commonsense Reasoning Capability of Text-to-Image Generative Models via Non-visual Description

December 12, 2023 · Declared Dead · 🏛 arXiv.org

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Mianzhi Pan, Jianfei Li, Mingyue Yu, Zheng Ma, Kanzhi Cheng, Jianbing Zhang, Jiajun Chen arXiv ID 2312.07294 Category cs.MM: Multimedia Citations 0 Venue arXiv.org Last Checked 4 months ago

Abstract

Commonsense reasoning, the ability to make logical assumptions about daily scenes, is one core intelligence of human beings. In this work, we present a novel task and dataset for evaluating the ability of text-to-image generative models to conduct commonsense reasoning, which we call PAINTaboo. Given a description with few visual clues of one object, the goal is to generate images illustrating the object correctly. The dataset was carefully hand-curated and covered diverse object categories to analyze model performance comprehensively. Our investigation of several prevalent text-to-image generative models reveals that these models are not proficient in commonsense reasoning, as anticipated. We trust that PAINTaboo can improve our understanding of the reasoning abilities of text-to-image generative models.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Multimedia

🌅 🌅 Old Age

Quality Assessment of In-the-Wild Videos

Dingquan Li, Tingting Jiang, Ming Jiang

cs.MM 🏛 ACM MM 📚 375 cites 6 years ago

R.I.P. 👻 Ghosted

Viewport-Adaptive Navigable 360-Degree Video Delivery

Xavier Corbillon, Gwendal Simon, ... (+2 more)

cs.MM 🏛 ICC 📚 328 cites 9 years ago

📚 📚 The Cartographer

A Comprehensive Survey on Cross-modal Retrieval

Kaiye Wang, Qiyue Yin, ... (+3 more)

cs.MM 🏛 arXiv 📚 322 cites 9 years ago

📚 📚 The Cartographer

An Overview of Cross-media Retrieval: Concepts, Methodologies, Benchmarks and Challenges

Yuxin Peng, Xin Huang, Yunzhen Zhao

cs.MM 🏛 IEEE TCSVT 📚 309 cites 9 years ago

R.I.P. 👻 Ghosted

A Convolutional Neural Network Approach for Post-Processing in HEVC Intra Coding

Yuanying Dai, Dong Liu, Feng Wu

cs.MM 🏛 ICMM 📚 305 cites 9 years ago

R.I.P. 👻 Ghosted

Video Generation From Text

Yitong Li, Martin Renqiang Min, ... (+3 more)

cs.MM 🏛 AAAI 📚 300 cites 8 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 9 years ago